Hi Nan,

A cleaner approach is to export a RESTful service to the external system.

The external system calls the service with appropriate api.

For Scala, Spray can be used to make these services. Twitter oss also many
examples of this service design.

Thanks.
Deb



On Tue, Jan 7, 2014 at 10:25 AM, Nan Zhu <[email protected]> wrote:

>  Hi, all
>
> Thanks for the reply
>
> I actually need to provide a single file to an external system to process
> it…seems that I have to make the consumer of the file to support multiple
> inputs
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, January 7, 2014 at 12:37 PM, Aaron Davidson wrote:
>
> HDFS, since 0.21 <https://issues.apache.org/jira/browse/HDFS-222>, has a
> concat() method which would do exactly this, but I am not sure of the
> performance implications. Of course, as Matei pointed out, it's unusual to
> actually need a single HDFS file.
>
>
> On Mon, Jan 6, 2014 at 9:08 PM, Matei Zaharia <[email protected]>wrote:
>
> Unfortunately this is expensive to do on HDFS — you’d need a single writer
> to write the whole file. If your file is small enough for that, you can use
> coalesce() on the RDD to bring all the data to one node, and then save it.
> However most HDFS applications work with directories containing multiple
> files instead of single files for this reason.
>
> Matei
>
> On Jan 6, 2014, at 10:56 PM, Nan Zhu <[email protected]> wrote:
>
> > Hi, all
> >
> > maybe a stupid question, but is there any way to make Spark write a
> single file instead of partitioned files?
> >
> > Best,
> >
> > --
> > Nan Zhu
> >
>
>
>
>

Reply via email to