Re: [Openstack] Savana/Swift large object copy error

2015-05-06 Thread Christian Schwede
Hello Ross,

On 05.05.15 21:54, Ross Lillie wrote:
 My understanding is that Swift should automagically split files greater
 that 5G into multiple segments grouped under a metafile but this appears
 to not be working. This was working under the Havana release (Ubuntu)
 using the Swift File System jar file downloaded from the Marantis web
 site.  All current testing is based up the Juno release and when
 performing a distcp using the openstack-hadoop jar file shipped as part
 of the latest hadoop distros.

I don't know the client you're using, but Swift itself (on the server
side) never splits data into segments on its own, and never did so.
Currently it's up to the client to ensure data is broken down into
segments of max. 5GB size.

There were some ideas in the past to implement this feature, however
this raised different problems. Have a look at the history of large
objects:

http://docs.openstack.org/developer/swift/overview_large_objects.html#history

I would assume that there was a change on the client side implementation
that changed the behavior for you?

Best Regards,

Christian

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


[Openstack] Savana/Swift large object copy error

2015-05-05 Thread Ross Lillie
We're currently running Openstack Juno and are experiencing errors when
performing large object copies between Hadoop HDFS and our Swift object
store. While not using the Savana service directly, we are relying upon the
Swift file system extension for Hadoop created as part of the Savana
project.

In each case, the large object copy (using Hadoop's distcp) results in
Swift reporting an Error 413 - Request entity too large.

As a test case, I created a 5.5 GB file of random data and tried to upload
the file to Swift using Swift's CLI command. Once again Swift returned
Error 413. If, however, I explicitly set a segment size on the Swift
command line of 1G, then the file uploads correctly.

When using Hadoop's distcp to move data from HDFS to Swift, the job always
exists with Swift reporting Error 413. Explicitly setting the
fs.swift.service.x.partsize does not appear to make any difference.

My understanding is that Swift should automagically split files greater
that 5G into multiple segments grouped under a metafile but this appears to
not be working. This was working under the Havana release (Ubuntu) using
the Swift File System jar file downloaded from the Marantis web site.  All
current testing is based up the Juno release and when performing a distcp
using the openstack-hadoop jar file shipped as part of the latest hadoop
distros.

Has anyone else seen this behavior?

Thanks,
/ross

-- 
Ross Lillie
Application Software  Architecture Group
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] Savana/Swift large object copy error

2015-05-05 Thread Ross Lillie
As a followup, when performing a distcp from HDFS to Swift, segments ARE
being created in the swift container with a .distcp- prefix. Each temporary
file appears to be related to the attempt of the map/reduce job.

Just as the last temporary segment appears in the remote container, the job
aborts, and of the .distcp- temporary objects are deleted and Hadoop
commences to the next attempt.

For example, for the currently running test case, the swift container
listing shows the following:

zantac:~ lillie$ swift list --lh backups

2.4G 2015-05-05 20:13:16
.distcp.tmp.attempt_1430771817173_0010_m_00_0/01

2.4G 2015-05-05 20:14:45
.distcp.tmp.attempt_1430771817173_0010_m_00_0/02

2.4G 2015-05-05 20:16:15
.distcp.tmp.attempt_1430771817173_0010_m_00_0/03

7.2G

Once the entire file is copied, the operation reports Error 413, and all
of the above files are deleted. It's as though the Swift file system isn't
able to close the file.


/ross

On Tue, May 5, 2015 at 2:54 PM, Ross Lillie 
ross.lil...@motorolasolutions.com wrote:

 We're currently running Openstack Juno and are experiencing errors when
 performing large object copies between Hadoop HDFS and our Swift object
 store. While not using the Savana service directly, we are relying upon the
 Swift file system extension for Hadoop created as part of the Savana
 project.

 In each case, the large object copy (using Hadoop's distcp) results in
 Swift reporting an Error 413 - Request entity too large.

 As a test case, I created a 5.5 GB file of random data and tried to upload
 the file to Swift using Swift's CLI command. Once again Swift returned
 Error 413. If, however, I explicitly set a segment size on the Swift
 command line of 1G, then the file uploads correctly.

 When using Hadoop's distcp to move data from HDFS to Swift, the job always
 exists with Swift reporting Error 413. Explicitly setting the
 fs.swift.service.x.partsize does not appear to make any difference.

 My understanding is that Swift should automagically split files greater
 that 5G into multiple segments grouped under a metafile but this appears to
 not be working. This was working under the Havana release (Ubuntu) using
 the Swift File System jar file downloaded from the Marantis web site.  All
 current testing is based up the Juno release and when performing a distcp
 using the openstack-hadoop jar file shipped as part of the latest hadoop
 distros.

 Has anyone else seen this behavior?

 Thanks,
 /ross

 --
 Ross Lillie
 Application Software  Architecture Group





-- 
Ross Lillie
Application Software  Architecture Group

View my calendar
https://www.google.com/calendar/embed?src=ross.lillie%40motorolasolutions.comctz=America/Chicago
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack