Re: [openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
Hi Iliia, I would take a look into BSON http://bsonspec.org/ Cheers, Serge Kovaleff On Thu, Mar 27, 2014 at 8:23 PM, Illia Khudoshyn ikhudos...@mirantis.comwrote: Hi, Openstackers, I'm currently working on adding bulk data load functionality to MagnetoDB. This functionality implies inserting huge amounts of data (billions of rows, gigabytes of data). The data being uploaded is a set of JSON's (for now). The question I'm interested in is a way of data transportation. For now I do streaming HTTP POST request from the client side with gevent.pywsgi on the server side. Could anybody suggest any (better?) approach for the transportation, please? What are best practices for that. Thanks in advance. -- Best regards, Illia Khudoshyn, Software Engineer, Mirantis, Inc. 38, Lenina ave. Kharkov, Ukraine www.mirantis.com http://www.mirantis.ru/ www.mirantis.ru Skype: gluke_work ikhudos...@mirantis.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
2014-03-28 11:29 GMT+02:00 Serge Kovaleff skoval...@mirantis.com: Hi Iliia, I would take a look into BSON http://bsonspec.org/ Cheers, Serge Kovaleff On Thu, Mar 27, 2014 at 8:23 PM, Illia Khudoshyn ikhudos...@mirantis.comwrote: Hi, Openstackers, I'm currently working on adding bulk data load functionality to MagnetoDB. This functionality implies inserting huge amounts of data (billions of rows, gigabytes of data). The data being uploaded is a set of JSON's (for now). The question I'm interested in is a way of data transportation. For now I do streaming HTTP POST request from the client side with gevent.pywsgi on the server side. Could anybody suggest any (better?) approach for the transportation, please? What are best practices for that. Thanks in advance. -- Best regards, Illia Khudoshyn, Software Engineer, Mirantis, Inc. 38, Lenina ave. Kharkov, Ukraine www.mirantis.com http://www.mirantis.ru/ www.mirantis.ru Skype: gluke_work ikhudos...@mirantis.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
Hi guys, I suggest taking a look, how Swift and Ceph do such things. 2014-03-28 12:33 GMT+02:00 Maksym Iarmak miar...@mirantis.com: 2014-03-28 11:29 GMT+02:00 Serge Kovaleff skoval...@mirantis.com: Hi Iliia, I would take a look into BSON http://bsonspec.org/ Cheers, Serge Kovaleff On Thu, Mar 27, 2014 at 8:23 PM, Illia Khudoshyn ikhudos...@mirantis.com wrote: Hi, Openstackers, I'm currently working on adding bulk data load functionality to MagnetoDB. This functionality implies inserting huge amounts of data (billions of rows, gigabytes of data). The data being uploaded is a set of JSON's (for now). The question I'm interested in is a way of data transportation. For now I do streaming HTTP POST request from the client side with gevent.pywsgi on the server side. Could anybody suggest any (better?) approach for the transportation, please? What are best practices for that. Thanks in advance. -- Best regards, Illia Khudoshyn, Software Engineer, Mirantis, Inc. 38, Lenina ave. Kharkov, Ukraine www.mirantis.com http://www.mirantis.ru/ www.mirantis.ru Skype: gluke_work ikhudos...@mirantis.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
Maksym Iarmak wrote: I suggest taking a look, how Swift and Ceph do such things. under swift (and CEPH via the radosgw which implement swift API) we are using POST and PUT which has been working relatively well Chmouel ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
On 03/28/2014 11:29 AM, Serge Kovaleff wrote: Hi Iliia, I would take a look into BSON http://bsonspec.org/ Cheers, Serge Kovaleff On Thu, Mar 27, 2014 at 8:23 PM, Illia Khudoshyn ikhudos...@mirantis.com mailto:ikhudos...@mirantis.com wrote: Hi, Openstackers, I'm currently working on adding bulk data load functionality to MagnetoDB. This functionality implies inserting huge amounts of data (billions of rows, gigabytes of data). The data being uploaded is a set of JSON's (for now). The question I'm interested in is a way of data transportation. For now I do streaming HTTP POST request from the client side with gevent.pywsgi on the server side. Could anybody suggest any (better?) approach for the transportation, please? What are best practices for that. Thanks in advance. -- Best regards, Illia Khudoshyn, Software Engineer, Mirantis, Inc. 38, Lenina ave. Kharkov, Ukraine www.mirantis.com http://www.mirantis.ru/ www.mirantis.ru http://www.mirantis.ru/ Skype: gluke_work ikhudos...@mirantis.com mailto:ikhudos...@mirantis.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Hi Iliia, I guess if we a talking about cassandra batch loading the fastest way is to generate sstables locally and load it into Cassandra via JMX or sstableloader http://www.datastax.com/dev/blog/bulk-loading If you want to implement bulk load via magnetodb layer (not to cassandra directly) you could try to use simple tcp socket and implement your binary protocol (using bson for example). Http is text protocol so using tcp socket can help you to avoid overhead of base64 encoding. In my opinion, working with HTTP and BSON is doubtful solution because you wil use 2 phase encoddung and decoding: 1) object to bson, 2) bson to base64, 3) base64 to bson, 4) bson to object 1) obect to json instead of 1) object to json, 2) json to object in case of HTTP + json Http streaming as I know is asynchronous type of http. You can expect performance growing thanks to skipping generation of http response on server side and waiting on for that response on client side for each chunk. But you still need to send almost the same amount of data. So if network throughput is your bottleneck - it doesn't help. If server side is your bottleneck - it doesn't help too. Also pay your attention that in any case, now MagnetoDB Cassandra Storage convert your data to CQL query which is also text. It would be nice to implement MagnetoDB BatchWriteItem operation via Cassandra sstable generation and loading via sstableloader, but unfortunately as I know this functionality support implemented only for Java world -- Best regards, Dmitriy Ukhlov Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
Dmitriy Ukhlov wrote: I guess if we a talking about cassandra batch loading the fastest way is to generate sstables locally and load it into Cassandra via JMX or sstableloader http://www.datastax.com/dev/blog/bulk-loading Good idea, Dmitriy. IMHO bulk load is back-end specific task. So using specialized tools seems good idea for me. Regards, Alexander Chudnovets ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
Bulk loading with sstableloader is blazingly fast (the price to pay is that's not portable of course). Also it's network efficient thanks to SSTable compression. If the network is not a limiting factor then LZ4 will be great. Le Vendredi 28 mars 2014 13h46, Aleksandr Chudnovets achudnov...@mirantis.com a écrit : Dmitriy Ukhlov wrote: I guess if we a talking about cassandra batch loading the fastest way is to generate sstables locally and load it into Cassandra via JMX or sstableloader http://www.datastax.com/dev/blog/bulk-loading Good idea, Dmitriy. IMHO bulk load is back-end specific task. So using specialized tools seems good idea for me. Regards, Alexander Chudnovets ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [MagnetoDB] Best practices for uploading large amounts of data
Hi, Openstackers, I'm currently working on adding bulk data load functionality to MagnetoDB. This functionality implies inserting huge amounts of data (billions of rows, gigabytes of data). The data being uploaded is a set of JSON's (for now). The question I'm interested in is a way of data transportation. For now I do streaming HTTP POST request from the client side with gevent.pywsgi on the server side. Could anybody suggest any (better?) approach for the transportation, please? What are best practices for that. Thanks in advance. -- Best regards, Illia Khudoshyn, Software Engineer, Mirantis, Inc. 38, Lenina ave. Kharkov, Ukraine www.mirantis.com http://www.mirantis.ru/ www.mirantis.ru Skype: gluke_work ikhudos...@mirantis.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev