[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713623#comment-17713623 ] Wojciech Indyk commented on SPARK-30542: Will be fixed by this PR: https://github.com/apache/spark/pull/40821 > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > Spark Structured Streaming doesn't allow two structured streaming jobs to > write data to the same base directory which is possible with using dstreams. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209425#comment-17209425 ] Sachin Pasalkar commented on SPARK-30542: - [~kabhwan] Can't we make this configurable? > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > Spark Structured Streaming doesn't allow two structured streaming jobs to > write data to the same base directory which is possible with using dstreams. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209310#comment-17209310 ] Jungtaek Lim commented on SPARK-30542: -- This is a limitation, not a bug. There're known 3rd party alternatives (A-Z order: Apache Hudi, Apache Iceberg, Delta Lake) which support multiple jobs writing to the same path, so you may want to explore such things. > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > Spark Structured Streaming doesn't allow two structured streaming jobs to > write data to the same base directory which is possible with using dstreams. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209297#comment-17209297 ] Sachin Pasalkar commented on SPARK-30542: - [~SparkSiva] Did you get a response to it? I see it's a bug in the latest release as well [https://github.com/apache/spark/blob/5472170a2b35864c617bdb846ff7123533765a16/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala#L36] I see a hardcoded value which bounds to fail for multiple jobs writing to same path > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > Spark Structured Streaming doesn't allow two structured streaming jobs to > write data to the same base directory which is possible with using dstreams. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024981#comment-17024981 ] Sivakumar commented on SPARK-30542: --- Sure, Thanks Hyukjin > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > Spark Structured Streaming doesn't allow two structured streaming jobs to > write data to the same base directory which is possible with using dstreams. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017967#comment-17017967 ] Sivakumar commented on SPARK-30542: --- Hi Jungtaek, I thought this might be a feature that should be added to structured streaming. Also Please lemme know If you have any work around for this. > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > Spark Structured Streaming doesn't allow two structured streaming jobs to > write data to the same base directory which is possible with using dstreams. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017820#comment-17017820 ] Sivakumar commented on SPARK-30542: --- Earlier with Spark Dstreams two jobs can have a same base path. But with Spark structured streaming I don't have that flexibility. I guess this should be a feature that structured streaming should support. > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > Spark Structured Streaming doesn't allow two structured streaming jobs to > write data to the same base directory which is possible with using dstreams. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path
[ https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017791#comment-17017791 ] Jungtaek Lim commented on SPARK-30542: -- This is more likely a question rather than actual bug which is encouraged to post user/dev mailing list to ask about. > Two Spark structured streaming jobs cannot write to same base path > -- > > Key: SPARK-30542 > URL: https://issues.apache.org/jira/browse/SPARK-30542 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sivakumar >Priority: Major > > Hi All, > I have two structured streaming jobs which should write data to the same base > directory. > As __spark___metadata directory will be created by default for one job, > second job cannot use the same directory as base path as already > _spark__metadata directory is created by other job, It is throwing exception. > Is there any workaround for this, other than creating separate base path's > for both the jobs. > Is it possible to create the __spark__metadata directory else where or > disable without any data loss. > If I had to change the base path for both the jobs, then my whole framework > will get impacted, So i don't want to do that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org