Re: [openstack-dev] [sahara] About Sahara Oozie plan and Spark CDH Issues
Hello everyone, there is already some code in our repository: https://github.com/bigfootproject/savanna-image-elements I did the necessary changes to have the Spark element use the cdh5 element. I updated also to Spark 1.2. The old cloudera HDFS-only element is still needed for generating cdh4 images (but probably cdh4 support can be thrown away). Unfortunately I do not have the time to do the necessary testing/validation and submit for review. I also changed the CDH element so that it can install only HDFS, if so required. The changes I made are simple and all contained in the last commit on the master branch of that repo. The image generated with this code runs in Sahara without any further changes. Feel free to take the code, clean it up and submit for review. Dan On Wed, Jan 28, 2015 at 10:43:30AM -0500, Trevor McKay wrote: Intel folks, Belated welcome to Sahara! Thank you for your recent commits. Moving this thread to openstack-dev so others may contribute, cc'ing Daniele and Pietro who pioneered the Spark plugin. I'll respond with another email about Oozie work, but I want to address the Spark/Swift issue in CDH since I have been working on it and there is a task which still needs to be done -- that is to upgrade the CDH version in the spark image and see if the situation improves (see below) Relevant reviews are here: https://review.openstack.org/146659 https://review.openstack.org/147955 https://review.openstack.org/147985 https://review.openstack.org/146659 In the first review, you can see that we set an extra driver classpath to pull in '/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar. This is because the spark-assembly JAR in CDH4 contains classes from jackson-mapper-asl-1.8.8 and jackson-core-asl-1.9.x. When the hadoop-swift.jar dereferences a Swift path, it calls into code from jackson-mapper-asl-1.8.8 which uses JsonClass. But JsonClass was removed in jackson-core-asl-1.9.x, so there is an exception. Therefore, we need to use the classpath to either upgrade the version of jackson-mapper-asl to 1.9.x or downgrade the version of jackson-core-asl to 1.8.8 (both work in my testing). However, the first of these options requires us to bundle an extra jar. Since /usr/lib/hadoop already contains jackson-core-asl-1.8.8, it is easier to just add that to the classpath and downgrade the jackson version. Note, there are some references to this problem on the spark mailing list, we are not the only ones to encounter it. However, I am not completely comfortable with mixing versions and patching the classpath this way. It looks to me like the Spark assembly used in CDH5 has consistent versions, and I would like to try updating the CDH version in sahara-image-elments to CDH5 for Spark. If this fixes the problem and removes the need for the extra classpath, that would be great. Would someone like to take on this change? (modifying sahara-image-elements to use CDH5 for Spark images) I can make a blueprint for it. More to come about Oozie topics. Best regards, Trevor On Thu, 2015-01-15 at 15:34 +, Chen, Weiting wrote: Hi Mckay. We are Intel team and contributing OpenStack Sahara project. We are new in Sahara and would like to do more contributions in this project. So far, we are focusing on Sahara CDH Plugin. So if there is any issues related on this, please feel free to discuss with us. During IRC meeting, there are two issues you mentioned and we would like to discuss with you. 1. Oozie Workflow Support: Do you have any plan could share with us about your idea? Because in our case, we are testing to run a java action job with HBase library support and also facing some problems about Oozie support. So it should be good to share the experience with each other. 2. Spark CDH Issues: Could you provide more information about this issue? In CDH Plugin, we have used CDH 5 to finish swift test. So it should be fine to upgrade CDH 4 to 5. -- Daniele Venzano http://www.brownhat.org __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sahara] About Sahara Oozie plan and Spark CDH Issues
Daniele, Excellent! I'll have to keep a closer eye on bigfoot activity :) I'll pursue this. Best, Trevor On Wed, 2015-01-28 at 17:40 +0100, Daniele Venzano wrote: Hello everyone, there is already some code in our repository: https://github.com/bigfootproject/savanna-image-elements I did the necessary changes to have the Spark element use the cdh5 element. I updated also to Spark 1.2. The old cloudera HDFS-only element is still needed for generating cdh4 images (but probably cdh4 support can be thrown away). Unfortunately I do not have the time to do the necessary testing/validation and submit for review. I also changed the CDH element so that it can install only HDFS, if so required. The changes I made are simple and all contained in the last commit on the master branch of that repo. The image generated with this code runs in Sahara without any further changes. Feel free to take the code, clean it up and submit for review. Dan On Wed, Jan 28, 2015 at 10:43:30AM -0500, Trevor McKay wrote: Intel folks, Belated welcome to Sahara! Thank you for your recent commits. Moving this thread to openstack-dev so others may contribute, cc'ing Daniele and Pietro who pioneered the Spark plugin. I'll respond with another email about Oozie work, but I want to address the Spark/Swift issue in CDH since I have been working on it and there is a task which still needs to be done -- that is to upgrade the CDH version in the spark image and see if the situation improves (see below) Relevant reviews are here: https://review.openstack.org/146659 https://review.openstack.org/147955 https://review.openstack.org/147985 https://review.openstack.org/146659 In the first review, you can see that we set an extra driver classpath to pull in '/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar. This is because the spark-assembly JAR in CDH4 contains classes from jackson-mapper-asl-1.8.8 and jackson-core-asl-1.9.x. When the hadoop-swift.jar dereferences a Swift path, it calls into code from jackson-mapper-asl-1.8.8 which uses JsonClass. But JsonClass was removed in jackson-core-asl-1.9.x, so there is an exception. Therefore, we need to use the classpath to either upgrade the version of jackson-mapper-asl to 1.9.x or downgrade the version of jackson-core-asl to 1.8.8 (both work in my testing). However, the first of these options requires us to bundle an extra jar. Since /usr/lib/hadoop already contains jackson-core-asl-1.8.8, it is easier to just add that to the classpath and downgrade the jackson version. Note, there are some references to this problem on the spark mailing list, we are not the only ones to encounter it. However, I am not completely comfortable with mixing versions and patching the classpath this way. It looks to me like the Spark assembly used in CDH5 has consistent versions, and I would like to try updating the CDH version in sahara-image-elments to CDH5 for Spark. If this fixes the problem and removes the need for the extra classpath, that would be great. Would someone like to take on this change? (modifying sahara-image-elements to use CDH5 for Spark images) I can make a blueprint for it. More to come about Oozie topics. Best regards, Trevor On Thu, 2015-01-15 at 15:34 +, Chen, Weiting wrote: Hi Mckay. We are Intel team and contributing OpenStack Sahara project. We are new in Sahara and would like to do more contributions in this project. So far, we are focusing on Sahara CDH Plugin. So if there is any issues related on this, please feel free to discuss with us. During IRC meeting, there are two issues you mentioned and we would like to discuss with you. 1. Oozie Workflow Support: Do you have any plan could share with us about your idea? Because in our case, we are testing to run a java action job with HBase library support and also facing some problems about Oozie support. So it should be good to share the experience with each other. 2. Spark CDH Issues: Could you provide more information about this issue? In CDH Plugin, we have used CDH 5 to finish swift test. So it should be fine to upgrade CDH 4 to 5. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev