[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 2:54 PM: --- In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions in combination with spark >= 2.4). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary. There will anyway be only one SARK_HOME set and i don't think there is anyone building spark with multiple scala versions in the same SPARK_HOME. was (Author: renedlog): In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions in combination with spark > 2.4). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary. There will anyway be only one SARK_HOME set and i don't think there is anyone building spark with multiple scala versions in the same SPARK_HOME. > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build .
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:07 PM: --- In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions in combination with spark > 2.4). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary. There will anyway be only one SARK_HOME set and i don't think there is anyone building spark with multiple scala versions in the same SPARK_HOME. was (Author: renedlog): In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions in combination with spark > 2.4). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary. There will anyway be only one SARK_HOME set and i don't think there is anyone building multiple scala versions in the same SPARK_HOME. > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:06 PM: --- In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions in combination with spark > 2.4). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary. There will anyway be only one SARK_HOME set and i don't think there is anyone building multiple scala versions in the same SPARK_HOME. was (Author: renedlog): In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions in combination with spark > 2.4). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:05 PM: --- In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions in combination with spark > 2.4). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary was (Author: renedlog): In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:04 PM: --- In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine (at least for <=0.13.0 versions). One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary was (Author: renedlog): In addition line 240-257 (in the current master ./bin/mahout) can potentially be removed by just adding: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' | sed 's/:$//') {code} where the duplicate jars throw an error but it's still working fine. One could also just exclude those 3 jars that causing the duplication error e.g.: {code:bash} export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//') {code} this removes a lot of complexity that's not really necessary > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066401#comment-17066401 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 11:05 AM: Why are docs erros showing up when i -Dmaven.javadoc._skip_=true? the command i call is the same as described in the cooccurence doc: [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/] e.g.: {code:java} mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn -D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true {code} this works quite well with mahout 0.13.0 and spark 2.4.5 + scala 2.11 so i do not think it is the command or setup itself everything above 0.13.0 there seems to be the scopt-optionparser issue (including master and 14.1-cleanup) was (Author: renedlog): Why are docs erros showing up when i -Dmaven.javadoc._skip_=true? the command i call is the same as described in the cooccurence doc: [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/] e.g.: {code:java} mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn -D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true {code} this works quite well with mahout 0.13.0 and spark 2.4.5 + scala 2.11 so i do not think it is the command or setup itself > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066401#comment-17066401 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 5:53 AM: --- Why are docs erros showing up when i -Dmaven.javadoc._skip_=true? the command i call is the same as described in the cooccurence doc: [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/] e.g.: {code:java} mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn -D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true {code} this works quite well with mahout 0.13.0 and spark 2.4.5 + scala 2.11 so i do not think it is the command or setup itself was (Author: renedlog): Why are docs erros showing up when i -Dmaven.javadoc._skip_=true? the command i call is the same as described in the cooccurence doc: [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/] e.g.: {code:java} mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn -D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true{code} > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066401#comment-17066401 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 5:50 AM: --- Why are docs erros showing up when i -Dmaven.javadoc._skip_=true? the command i call is the same as described in the cooccurence doc: [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/] e.g.: {code:java} mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn -D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true{code} was (Author: renedlog): Why are docs erros showing up when i -Dmaven.javadoc._skip_=true? the command i call is the same as described in the cooccurence doc: [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/] e.g.: > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066200#comment-17066200 ] Andrew Palumbo edited comment on MAHOUT-2093 at 3/24/20, 9:47 PM: -- bq. and the main problem is. this is not a warning it should be an error when building: Those are actually warnings about the java/scaladoc builds. they are missing links in the commas to classer which have been refactored. Java 8 throws errors whin building apidics, Scala 2.11 just reports broken links as warnings during the build. Thank you again for reporting, I will go over my notes and post what i believe is the fix shorty (and a potential fix which I'd started on a few weeks back shortly). Thanks for reporting this and for your patience. was (Author: andrew_palumbo): > and the main problem is. this is not a warning it should be an error when > building: Those are actually warnings about the java/scaladoc builds. they are missing links in the commas to classer which have been refactored. Java 8 throws errors whin building apidics, Scala 2.11 just reports broken links as warnings during the build. Thank you again for reporting, I will go over my notes and post what i believe is the fix shorty (and a potential fix which I'd started on a few weeks back shortly). Thanks for reporting this and for your patience. > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1706#comment-1706 ] Andrew Palumbo edited comment on MAHOUT-2093 at 3/24/20, 12:15 PM: --- [~renedlog] I think i have identified the problem. I am just getting back to things after some time off for a minor medical procedure, and would like to verify that my written response is correct, I will try to have an answer to you asap, and we can get a build out quickly. As for this being an RC, it is in fact the release that we are releasing, so we're still looking for a viable release candidate for this release release. We've updated to the newest Apache poms, which change the whole process.of releasing as well .. I actually haven't looked at this in a week or so, I'll try to catch up shortly, and as i remember its a question of adding transitive dependencies (or even dependencies into the class path) which is something that we used to do with a provided fat jar. to the class-path.. I'll go over my notes ant let you know shortly. Thanks for reporting this, we'll get a fix out soon to master. I do think i have identified it, I hope to have a fix out for you shortly.. was (Author: andrew_palumbo): [~renedlog] I think i have identified the problem. I am just getting back to things after some time off for a minor medical procedure, and would like to verify that my written response is correct, I will try to have an answer to you asap, and we can get a build out quickly. As for this being an RC, it is in fact the release that we are releasing, so we're still looking for a viable release candidate for this release release. We've updated to the newest Apache poms, which change the whole process.of releasing as well .. I actually haven't looked at this in a week or so, I'll try to catch up shortly, and as i remember its a question of adding transitive dependencies (or even dependencies (into the class path) which is something that we used to do with a provided fat jar. to the class-path.. I'll go over my notes ant let you know shortly. Thanks for reporting this, we'll get a fix out soon. I do think i have identified it, I hope to have a fix out for you shortly.. > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Priority: Blocker > Fix For: 14.1 > > Attachments: image-2020-03-12-07-10-34-731.png > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR}
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049916#comment-17049916 ] Andrew Palumbo edited comment on MAHOUT-2093 at 3/3/20 5:22 AM: Thank you for reporting this, and for digging in [~renedlog], we usually test CLI issues out when we have a viable RC. This is interesting, though because We do have CLI tests in our CI running in spark pseudo-distributed mode(i.e. {{master=spark://localhost:7077}}). Though this will not catch everything. I will do some digging tonight if i have a chance. I may be out for a bit, so am going to leave this unassigned.. But one of us will look shortly. was (Author: andrew_palumbo): Thank you for reporting this, and for digging in [~renedlog], we usually test CLI issues out when we have a viable RC. This is interesting, though because We do have CLI tests in our CI running in spark pseudo-distributed mode(i.e. \{{master=spark://localhost:7077}}). Though this will not catch everything. I may be out for a bit, but one of us will look shortly. > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2, 14.1 >Reporter: Stefan Goldener >Assignee: Andrew Palumbo >Priority: Blocker > Fix For: 14.1 > > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ > bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} > --pip --tgz -DzincPort=${ZINC_PORT} \ > -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive > -Phive-thriftserver -Pscala-${SCALA_MAJOR} > > ### build mahout > RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip && \ > unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ > rm ${MAHOUT_BASE}.zip && \ > cd ${MAHOUT_HOME} && \ > mvn -Dspark.version=${SPARK_MAJOR_MINOR} > -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} > -DskipTests -Dmaven.javadoc.skip=true clean package > {code} > docker build . -t mahout-test > docker run -it mahout-test /bin/bash -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048992#comment-17048992 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 10:12 AM: -- Using Spark Prebuild it is just the scopt/OptionParser error: {code:yaml} FROM openjdk:8-alpine ENV spark_uid=185 ENV SCALA_MAJOR=2.11 ENV SCALA_MAJOR_MINOR=2.11.12 ENV HADOOP_MAJOR=2.7 ENV SPARK_MAJOR_MINOR=2.4.5 ENV MAHOUT_MAJOR_MINOR=14.1 ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} ENV MAHOUT_BASE=/opt/mahout ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} ENV SPARK_BASE=/opt/spark ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" ENV ZINC_PORT=3030 ### build spark RUN apk add --no-cache curl bash openjdk8-jre python3 py-pip nss libc6-compat git unzip maven \ && ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2 \ && mkdir -p ${MAHOUT_HOME} \ && mkdir -p ${SPARK_BASE} \ && wget https://archive.apache.org/dist/spark/spark-${SPARK_MAJOR_MINOR}/spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR}.tgz \ && tar -xvzf spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR}.tgz -C ${SPARK_BASE}/ \ && mv ${SPARK_BASE}/spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR} ${SPARK_HOME} \ && rm spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR}.tgz \ ### build mahout RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \ cd ${MAHOUT_HOME} && \ sed -i '257d' ./bin/mahout && \ mvn -Dspark.version=${SPARK_MAJOR_MINOR} -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} -DskipTests -Dmaven.javadoc.skip=true clean package {code} {code:bash} bash-4.4# mahout spark-itemsimilarity Adding lib/ to CLASSPATH :/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: scopt.OptionParser at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 19 more {code} was (Author: renedlog): Using Spark Prebuild it is just the scopt/OptionParser error: {code:yaml} FROM openjdk:8-alpine ENV spark_uid=185 ENV SCALA_MAJOR=2.11 ENV SCALA_MAJOR_MINOR=2.11.12 ENV HADOOP_MAJOR=2.7 ENV SPARK_MAJOR_MINOR=2.4.5 ENV MAHOUT_MAJOR_MINOR=14.1 ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} ENV MAHOUT_BASE=/opt/mahout ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} ENV SPARK_BASE=/opt/spark ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" ENV ZINC_PORT=3030 ### build spark RUN apk add --no-cache curl bash openjdk8-jre python3 py-pip nss libc6-compat git unzip maven \ && ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2 \ && mkdir -p ${MAHOUT_HOME} \ && mkdir
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:55 AM: - What is really interesting... why do all test run successfully (without -DskipTests)? Looks like the tests seem different to the true environment. Here to build with main branch: {code:yaml} FROM openjdk:8-alpineENV spark_uid=185 ENV SCALA_MAJOR=2.11 ENV SCALA_MAJOR_MINOR=2.11.12 ENV HADOOP_MAJOR=2.7 ENV SPARK_MAJOR_MINOR=2.4.5 ENV MAHOUT_MAJOR_MINOR=14.1 ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} ENV MAHOUT_BASE=/opt/mahout ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} ENV SPARK_BASE=/opt/spark ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" ENV SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; ENV ZINC_PORT=3030 ### build spark RUN set -ex && \ apk upgrade --no-cache && \ ln -s /lib /lib64 && \ apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 krb5-libs nss curl openssl git maven && \ pip install setuptools && \ mkdir -p ${MAHOUT_HOME} && \ mkdir -p ${SPARK_BASE} && \ curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ rm ${SPARK_HOME}.tgz && \ export PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \ bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz -DzincPort=${ZINC_PORT} \ -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver -Pscala-${SCALA_MAJOR} ### build mahout RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \ cd ${MAHOUT_HOME} && \ sed -i '257d' ./bin/mahout && \ mvn -Dspark.version=${SPARK_MAJOR_MINOR} -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} -DskipTests -Dmaven.javadoc.skip=true clean package {code} Please note the *sed -i '257d' ./bin/mahout* This is a fix for an issue in the main branch causing an error. In addition the scopt/OptionParser is throwing now an error: {code:bash} bash-4.4# ./bin/mahout spark-itemsimilarity Adding lib/ to CLASSPATH :/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: scopt.OptionParser at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 19 more {code} The error is dangling (once the above once this below) with another error: {code:bash} bash-4.4# mahout spark-itemsimilarity Adding lib/ to CLASSPATH
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:44 AM: - What is really interesting... why do all test run successfully (without -DskipTests)? Looks like the tests seem different to the true environment. Here to build with main branch: {code:yaml} FROM openjdk:8-alpineENV spark_uid=185 ENV SCALA_MAJOR=2.11 ENV SCALA_MAJOR_MINOR=2.11.12 ENV HADOOP_MAJOR=2.7 ENV SPARK_MAJOR_MINOR=2.4.5 ENV MAHOUT_MAJOR_MINOR=14.1 ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} ENV MAHOUT_BASE=/opt/mahout ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} ENV SPARK_BASE=/opt/spark ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" ENV SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; ENV ZINC_PORT=3030 ### build spark RUN set -ex && \ apk upgrade --no-cache && \ ln -s /lib /lib64 && \ apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 krb5-libs nss curl openssl git maven && \ pip install setuptools && \ mkdir -p ${MAHOUT_HOME} && \ mkdir -p ${SPARK_BASE} && \ curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ rm ${SPARK_HOME}.tgz && \ export PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \ bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz -DzincPort=${ZINC_PORT} \ -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver -Pscala-${SCALA_MAJOR} ### build mahout RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \ cd ${MAHOUT_HOME} && \ sed -i '257d' ./bin/mahout && \ mvn -Dspark.version=${SPARK_MAJOR_MINOR} -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} -DskipTests -Dmaven.javadoc.skip=true clean package {code} Please note the *sed -i '257d' ./bin/mahout* This is a fix for an issue in the main branch causing an error. In addition the scopt/OptionParser is throwing now an error: {code:bash} bash-4.4# ./bin/mahout spark-itemsimilarity Adding lib/ to CLASSPATH :/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: scopt.OptionParser at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 19 more {code} was (Author: renedlog): What is really interesting... why do all test run successfully (without -DskipTests)? Looks like the tests seem different to the true environment. Here to build with main branch: {code:yaml} FROM
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:43 AM: - What is really interesting... why do all test run successfully (without -DskipTests)? Looks like the tests seem different to the true environment. Here to build with main branch: {code:yaml} FROM openjdk:8-alpineENV spark_uid=185 ENV SCALA_MAJOR=2.11 ENV SCALA_MAJOR_MINOR=2.11.12 ENV HADOOP_MAJOR=2.7 ENV SPARK_MAJOR_MINOR=2.4.5 ENV MAHOUT_MAJOR_MINOR=14.1 ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} ENV MAHOUT_BASE=/opt/mahout ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} ENV SPARK_BASE=/opt/spark ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" ENV SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; ENV ZINC_PORT=3030 ### build spark RUN set -ex && \ apk upgrade --no-cache && \ ln -s /lib /lib64 && \ apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 krb5-libs nss curl openssl git maven && \ pip install setuptools && \ mkdir -p ${MAHOUT_HOME} && \ mkdir -p ${SPARK_BASE} && \ curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ rm ${SPARK_HOME}.tgz && \ export PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \ bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz -DzincPort=${ZINC_PORT} \ -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver -Pscala-${SCALA_MAJOR} ### build mahout RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \ cd ${MAHOUT_HOME} && \ sed -i '257d' ./bin/mahout \ mvn -Dspark.version=${SPARK_MAJOR_MINOR} -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} -DskipTests -Dmaven.javadoc.skip=true clean package {code} Please note the *sed -i '257d' ./bin/mahout* This is a fix for an issue in the main branch causing an error. In addition the scopt/OptionParser is throwing now an error: {code:bash} bash-4.4# ./bin/mahout spark-itemsimilarity Adding lib/ to CLASSPATH :/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: scopt.OptionParser at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 19 more {code} was (Author: renedlog): What is really interesting... why do all test run successfully (without -DskipTests)? Looks like the tests seem different to the true environment. Here to build with main branch: {code:yaml} FROM openjdk:8-alpineENV
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853 ] Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:00 AM: - What is really interesting... why do all test run successfully (without -DskipTests)? Looks like the tests seem different to the true environment. Here to build with main branch: {code:yaml} FROM openjdk:8-alpineENV spark_uid=185 ENV SCALA_MAJOR=2.11 ENV SCALA_MAJOR_MINOR=2.11.12 ENV HADOOP_MAJOR=2.7 ENV SPARK_MAJOR_MINOR=2.4.5 ENV MAHOUT_MAJOR_MINOR=14.1 ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} ENV MAHOUT_BASE=/opt/mahout ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} ENV SPARK_BASE=/opt/spark ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" ENV SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; ENV ZINC_PORT=3030 ### build spark RUN set -ex && \ apk upgrade --no-cache && \ ln -s /lib /lib64 && \ apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 krb5-libs nss curl openssl git maven && \ pip install setuptools && \ mkdir -p ${MAHOUT_HOME} && \ mkdir -p ${SPARK_BASE} && \ curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ rm ${SPARK_HOME}.tgz && \ export PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \ bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \ bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} --pip --tgz -DzincPort=${ZINC_PORT} \ -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver -Pscala-${SCALA_MAJOR} ### build mahout RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \ cd ${MAHOUT_HOME} && \ sed -i '257d' ./bin/mahout \ mvn -Dspark.version=${SPARK_MAJOR_MINOR} -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} -DskipTests -Dmaven.javadoc.skip=true clean package {code} Please note the *sed -i '257d' ./bin/mahout* This is an issue in the main branch causing an error. In addition the scopt/OptionParser is throwing now an error: {code:bash} bash-4.4# ./bin/mahout spark-itemsimilarity Adding lib/ to CLASSPATH :/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: scopt.OptionParser at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 19 more {code} was (Author: renedlog): What is really interesting... why do all test run successfully (without -DskipTests)? Looks like the tests seem different to the true environment. Here to build with main branch: {code:yaml} FROM openjdk:8-alpineENV
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048499#comment-17048499 ] Andrew Palumbo edited comment on MAHOUT-2093 at 3/1/20 11:01 AM: - [[~renedlog]] This is an issue with the Scopt 3.3.0 CLI interface, when upgrading to scala 11.x, version 3.3.0 started giving us problems with i believe some conflicting transitive dependencies-- was somethingh along these lines. We've upgraded in the current master for v14.1 to Scopt v3.7.1, which has solved the problem. The Mahout Spark Shell is actually handled differently in the call to {{/bin/mahout}}[1], and is a pass through to Spark's Scala shell [1], with the mahout spark specific and abstract {{.jars}} added, so it does not use the Scopt CLI drivers [1][2][3], which is why the shell works without issue in that release. 0.14.0 is a huge refactor of the codebase, we are still moving Mahout-Hadoop MapReduce into the background, and we're still working out some of the kinks of this refactor in 14.1.. I would suggest the you try the last RC, but I believe there was a missing the distribution module, from the source distribution which broke the build, and which was the reason we scrapped it. CLI drivers should be working in the current {{github/master}}: [https://github.com/apache/mahout.git] which is currently (mostly) stable. Thanks for reporting it. [1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314] [2] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44] [3] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30] was (Author: andrew_palumbo): This is an issue with the Scopt 3.3.0 CLI interface. We've upgraded in the current master for v14.1 to Scopt v3.7.1, which has solved the problem. The Mahout Spark Shell is actually handled differently in the call to `/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why it works without issue in that release. 0.14.1 is a huge refactor of the codebase, we're still working out some of the kinks in 14.1. I would suggest the last RC, but I believe there was a missing module, from the source distribution which was the reason we scrapped it. CLI drivers should be working in the current {{github/master}}: [https://github.com/apache/mahout.git] which is currently (mostly) stable. [1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314] [2] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44] [3] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30] > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2 >Reporter: Stefan Goldener >Priority: Blocker > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl
[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken
[ https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048499#comment-17048499 ] Andrew Palumbo edited comment on MAHOUT-2093 at 3/1/20 8:59 AM: This is an issue with the Scopt 3.3.0 CLI interface. We've upgraded in the current master for v14.1 to Scopt v3.7.1, which has solved the problem. The Mahout Spark Shell is actually handled differently in the call to `/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why it works without issue in that release. 0.14.1 is a huge refactor of the codebase, we're still working out some of the kinks in 14.1. I would suggest the last RC, but I believe there was a missing module, from the source distribution which was the reason we scrapped it. CLI drivers should be working in the current {{github/master}}: [https://github.com/apache/mahout.git] which is currently (mostly) stable. [1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314] [2] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44] [3] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30] was (Author: andrew_palumbo): This is an issue with the Scopt 3.3.0 CLI interface. We've upgraded in the current master for v14.1 to Scopt v3.7.1, which has solved the problem. The Mahout Spark Shell is actually handled differently in the call to `/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why it works without issue in that release. 0.14.1 is a huge refactor of the codebase, we're still working out some of the kinks in 14.1. I would suggest the last RC, but I believe there was a missing module, from the source distribution which was the reason we scrapped it. It should be working in `github/master`: [https://github.com/apache/mahout.git] which is currently (mostly) stable. [1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314] [2] [https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44] [3] https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30 > Mahout Source Broken > > > Key: MAHOUT-2093 > URL: https://issues.apache.org/jira/browse/MAHOUT-2093 > Project: Mahout > Issue Type: Bug > Components: Algorithms, Collaborative Filtering, Documentation >Affects Versions: 0.14.0, 0.13.2 >Reporter: Stefan Goldener >Priority: Blocker > > Seems like newer versions of Mahout do have problems with spark bindings e.g. > mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to > class not found exceptions. > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.RowSimilarityDriver > {code} > {code:java} > Error: Could not find or load main class > org.apache.mahout.drivers.ItemSimilarityDriver > {code} > whereas *mahout spark-shell* works flawlessly. > Here is a short Dockerfile to show the issue: > {code:yaml} > FROM openjdk:8-alpine > ENV spark_uid=185 > ENV SCALA_MAJOR=2.11 > ENV SCALA_MAJOR_MINOR=2.11.12 > ENV HADOOP_MAJOR=2.7 > ENV SPARK_MAJOR_MINOR=2.4.5 > ENV MAHOUT_MAJOR_MINOR=0.14.0 > ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR} > ENV MAHOUT_BASE=/opt/mahout > ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION} > ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR} > ENV SPARK_BASE=/opt/spark > ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION} > ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" > ENV > SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz; > ENV > MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip; > ENV ZINC_PORT=3030 > ### build spark > RUN set -ex && \ > apk upgrade --no-cache && \ > ln -s /lib /lib64 && \ > apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 > krb5-libs nss curl openssl git maven && \ > pip install setuptools && \ > mkdir -p ${MAHOUT_HOME} && \ > mkdir -p ${SPARK_BASE} && \ > curl -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz && \ > tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \ > rm ${SPARK_HOME}.tgz && \ > export > PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin > && \ > bash ${SPARK_HOME}/dev/change-scala-version.sh