[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2023-04-18 Thread Wojciech Indyk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713623#comment-17713623
 ] 

Wojciech Indyk commented on SPARK-30542:


Will be fixed by this PR: https://github.com/apache/spark/pull/40821

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-10-07 Thread Sachin Pasalkar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209425#comment-17209425
 ] 

Sachin Pasalkar commented on SPARK-30542:
-

[~kabhwan] Can't we make this configurable?

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-10-07 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209310#comment-17209310
 ] 

Jungtaek Lim commented on SPARK-30542:
--

This is a limitation, not a bug. There're known 3rd party alternatives (A-Z 
order: Apache Hudi, Apache Iceberg, Delta Lake) which support multiple jobs 
writing to the same path, so you may want to explore such things.

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-10-06 Thread Sachin Pasalkar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209297#comment-17209297
 ] 

Sachin Pasalkar commented on SPARK-30542:
-

[~SparkSiva] Did you get a response to it? I see it's a bug in the latest 
release as well

[https://github.com/apache/spark/blob/5472170a2b35864c617bdb846ff7123533765a16/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala#L36]
 

I see a hardcoded value which bounds to fail for multiple jobs writing to same 
path

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-01-28 Thread Sivakumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024981#comment-17024981
 ] 

Sivakumar commented on SPARK-30542:
---

Sure, Thanks Hyukjin

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-01-17 Thread Sivakumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017967#comment-17017967
 ] 

Sivakumar commented on SPARK-30542:
---

Hi Jungtaek,

I thought this might be a feature that should be added to structured streaming. 
Also Please lemme know If you have any work around for this.

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-01-17 Thread Sivakumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017820#comment-17017820
 ] 

Sivakumar commented on SPARK-30542:
---

Earlier with Spark Dstreams two jobs can have a same base path.

But with Spark structured streaming I don't have that flexibility. I guess this 
should be a feature that structured streaming should support.

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> Spark Structured Streaming doesn't allow two structured streaming jobs to 
> write data to the same base directory which is possible with using dstreams.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30542) Two Spark structured streaming jobs cannot write to same base path

2020-01-17 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017791#comment-17017791
 ] 

Jungtaek Lim commented on SPARK-30542:
--

This is more likely a question rather than actual bug which is encouraged to 
post user/dev mailing list to ask about.

> Two Spark structured streaming jobs cannot write to same base path
> --
>
> Key: SPARK-30542
> URL: https://issues.apache.org/jira/browse/SPARK-30542
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sivakumar
>Priority: Major
>
> Hi All,
> I have two structured streaming jobs which should write data to the same base 
> directory.
> As __spark___metadata directory will be created by default for one job, 
> second job cannot use the same directory as base path as already 
> _spark__metadata directory is created by other job, It is throwing exception.
> Is there any workaround for this, other than creating separate base path's 
> for both the jobs.
> Is it possible to create the __spark__metadata directory else where or 
> disable without any data loss.
> If I had to change the base path for both the jobs, then my whole framework 
> will get impacted, So i don't want to do that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org