[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-25 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 2:54 PM:
---

In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions in combination with spark >= 2.4). One could also just 
exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary. There will anyway 
be only one SARK_HOME set and i don't think there is anyone building spark with 
multiple scala versions in the same SPARK_HOME.


was (Author: renedlog):
In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions in combination with spark > 2.4). One could also just 
exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary. There will anyway 
be only one SARK_HOME set and i don't think there is anyone building spark with 
multiple scala versions in the same SPARK_HOME.

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . 

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-25 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:07 PM:
---

In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions in combination with spark > 2.4). One could also just 
exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary. There will anyway 
be only one SARK_HOME set and i don't think there is anyone building spark with 
multiple scala versions in the same SPARK_HOME.


was (Author: renedlog):
In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions in combination with spark > 2.4). One could also just 
exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary. There will anyway 
be only one SARK_HOME set and i don't think there is anyone building multiple 
scala versions in the same SPARK_HOME.

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t 

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-25 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:06 PM:
---

In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions in combination with spark > 2.4). One could also just 
exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary. There will anyway 
be only one SARK_HOME set and i don't think there is anyone building multiple 
scala versions in the same SPARK_HOME.


was (Author: renedlog):
In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions in combination with spark > 2.4). One could also just 
exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-25 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:05 PM:
---

In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions in combination with spark > 2.4). One could also just 
exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary


was (Author: renedlog):
In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions). One could also just exclude those 3 jars that causing 
the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-25 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066655#comment-17066655
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 1:04 PM:
---

In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine (at least 
for <=0.13.0 versions). One could also just exclude those 3 jars that causing 
the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary


was (Author: renedlog):
In addition line 240-257 (in the current master ./bin/mahout) can potentially 
be removed by just adding:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -printf '%p:' 
| sed 's/:$//')
{code}
where the duplicate jars throw an error but it's still working fine. One could 
also just exclude those 3 jars that causing the duplication error e.g.:
{code:bash}
export CLASSPATH=$CLASSPATH:$(find $SPARK_HOME/jars -name '*.jar' -not -name 
'netty-3.8.0.Final.jar' -printf '%p:' | sed 's/:$//')
{code}
this removes a lot of complexity that's not really necessary

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-25 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066401#comment-17066401
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 11:05 AM:


Why are docs erros showing up when i -Dmaven.javadoc._skip_=true?

the command i call is the same as described in the cooccurence doc:
 [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/]

e.g.:
{code:java}
mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 
purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn 
-D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true
{code}
this works quite well with mahout 0.13.0 and spark 2.4.5 + scala 2.11
so i do not think it is the command or setup itself 

everything above 0.13.0 there seems to be the scopt-optionparser issue 
(including master and 14.1-cleanup)


was (Author: renedlog):
Why are docs erros showing up when i -Dmaven.javadoc._skip_=true?

the command i call is the same as described in the cooccurence doc:
 [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/]

e.g.:
{code:java}
mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 
purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn 
-D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true
{code}
this works quite well with mahout 0.13.0 and spark 2.4.5 + scala 2.11
so i do not think it is the command or setup itself 

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-24 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066401#comment-17066401
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 5:53 AM:
---

Why are docs erros showing up when i -Dmaven.javadoc._skip_=true?

the command i call is the same as described in the cooccurence doc:
 [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/]

e.g.:
{code:java}
mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 
purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn 
-D:spark.dynamicAllocation.enabled=true -D:spark.shuffle.service.enabled=true
{code}
this works quite well with mahout 0.13.0 and spark 2.4.5 + scala 2.11
so i do not think it is the command or setup itself 


was (Author: renedlog):
Why are docs erros showing up when i -Dmaven.javadoc._skip_=true?

the command i call is the same as described in the cooccurence doc:
 [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/]

e.g.:
{code:java}
mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 
purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn 
-D:spark.dynamicAllocation.enabled=true 
-D:spark.shuffle.service.enabled=true{code}

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-24 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066401#comment-17066401
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/25/20, 5:50 AM:
---

Why are docs erros showing up when i -Dmaven.javadoc._skip_=true?

the command i call is the same as described in the cooccurence doc:
 [https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/]

e.g.:
{code:java}
mahout spark-itemsimilarity -i /tmp/tabl10/ -o /tmp/rec1itemout -rd ',' -f1 
purchase -rc 0 -fc 1 -ic 2 -os -sem 10g -ma yarn 
-D:spark.dynamicAllocation.enabled=true 
-D:spark.shuffle.service.enabled=true{code}


was (Author: renedlog):
Why are docs erros showing up when i -Dmaven.javadoc._skip_=true?

the command i call is the same as described in the cooccurence doc:
[https://mahout.apache.org/docs/latest/tutorials/intro-cooccurrence-spark/]

e.g.:

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-24 Thread Andrew Palumbo (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066200#comment-17066200
 ] 

Andrew Palumbo edited comment on MAHOUT-2093 at 3/24/20, 9:47 PM:
--

bq. and the main problem is. this is not a warning it should be an error 
when building:

Those are actually warnings about the java/scaladoc builds.  they are missing 
links in the commas to classer which have been refactored.  Java 8 throws 
errors whin building apidics,  Scala 2.11 just reports broken links  as 
warnings during the build. 

Thank you again for reporting,  I will go over my notes and post what i believe 
is the fix shorty (and a potential fix which I'd started on a few weeks back 
shortly).  

Thanks for reporting this and  for your patience.




was (Author: andrew_palumbo):
> and the main problem is. this is not a warning it should be an error when 
> building:

Those are actually warnings about the java/scaladoc builds.  they are missing 
links in the commas to classer which have been refactored.  Java 8 throws 
errors whin building apidics,  Scala 2.11 just reports broken links  as 
warnings during the build. 

Thank you again for reporting,  I will go over my notes and post what i believe 
is the fix shorty (and a potential fix which I'd started on a few weeks back 
shortly).  

Thanks for reporting this and  for your patience.



> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-24 Thread Andrew Palumbo (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1706#comment-1706
 ] 

Andrew Palumbo edited comment on MAHOUT-2093 at 3/24/20, 12:15 PM:
---

[~renedlog] I think i have identified the problem.  I am just getting back to 
things after some time off for a minor medical procedure, and would like to 
verify that my written response is correct,  I will try to have an answer to 
you asap, and we can get a build out quickly.  As for this being an RC, it is 
in fact the release that we are releasing, so we're still looking for a viable 
release candidate for this release release.  We've updated to the newest Apache 
poms, which change the whole process.of releasing as well ..  I actually 
haven't  looked at this in a week or so, I'll try to catch up shortly, and as i 
remember its a question of adding transitive dependencies (or even dependencies 
into the class path) which is something that we used to do with a provided fat 
jar.  to the class-path.. I'll go over my notes ant let you know shortly. 

Thanks for reporting this, we'll get a fix out soon to master.

  I do think i have identified it, I hope to have a fix out for you shortly..


was (Author: andrew_palumbo):
[~renedlog] I think i have identified the problem.  I am just getting back to 
things after some time off for a minor medical procedure, and would like to 
verify that my written response is correct,  I will try to have an answer to 
you asap, and we can get a build out quickly.  As for this being an RC, it is 
in fact the release that we are releasing, so we're still looking for a viable 
release candidate for this release release.  We've updated to the newest Apache 
poms, which change the whole process.of releasing as well ..  I actually 
haven't  looked at this in a week or so, I'll try to catch up shortly, and as i 
remember its a question of adding transitive dependencies (or even dependencies 
(into the class path) which is something that we used to do with a provided fat 
jar.  to the class-path.. I'll go over my notes ant let you know shortly.  


Thanks for reporting this, we'll get a fix out soon.  I do think i have 
identified it, I hope to have a fix out for you shortly..

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Priority: Blocker
> Fix For: 14.1
>
> Attachments: image-2020-03-12-07-10-34-731.png
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-02 Thread Andrew Palumbo (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049916#comment-17049916
 ] 

Andrew Palumbo edited comment on MAHOUT-2093 at 3/3/20 5:22 AM:


Thank you for reporting this, and for digging in [~renedlog], we usually test 
CLI issues out when we have a viable RC.  This is interesting, though because 
We do have CLI tests in our CI running in spark pseudo-distributed mode(i.e. 
{{master=spark://localhost:7077}}). Though this will not catch everything. 

I will do some digging tonight if i have a chance.  I may be out for a bit, so 
am going to leave this unassigned.. 

 But one of us will look shortly.


was (Author: andrew_palumbo):
Thank you for reporting this, and for digging in [~renedlog], we usually test 
CLI issues out when we have a viable RC.  This is interesting, though because 
We do have CLI tests in our CI running in spark pseudo-distributed mode(i.e. 
\{{master=spark://localhost:7077}}). Though this will not catch everything.  

I may be out for a bit, but one of us will look shortly.

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2, 14.1
>Reporter: Stefan Goldener
>Assignee: Andrew Palumbo
>Priority: Blocker
> Fix For: 14.1
>
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
> bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
> --pip --tgz -DzincPort=${ZINC_PORT} \
> -Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
> -Phive-thriftserver -Pscala-${SCALA_MAJOR}
> 
> ### build mahout
> RUN curl -LfsS $MAHOUT_SRC_URL -o ${MAHOUT_BASE}.zip  && \
> unzip ${MAHOUT_BASE}.zip -d ${MAHOUT_BASE} && \ 
> rm ${MAHOUT_BASE}.zip && \
> cd ${MAHOUT_HOME} && \
> mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
> -Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
> -DskipTests -Dmaven.javadoc.skip=true clean package 
> {code}
> docker build . -t mahout-test
>  docker run -it mahout-test /bin/bash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-02 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048992#comment-17048992
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 10:12 AM:
--

Using Spark Prebuild it is just the scopt/OptionParser error:

{code:yaml}
FROM openjdk:8-alpine

ENV spark_uid=185
ENV SCALA_MAJOR=2.11
ENV SCALA_MAJOR_MINOR=2.11.12
ENV HADOOP_MAJOR=2.7
ENV SPARK_MAJOR_MINOR=2.4.5
ENV MAHOUT_MAJOR_MINOR=14.1
ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
ENV MAHOUT_BASE=/opt/mahout
ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
ENV SPARK_BASE=/opt/spark
ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
ENV ZINC_PORT=3030

### build spark
RUN apk add --no-cache curl bash openjdk8-jre python3 py-pip nss libc6-compat 
git unzip maven \
  && ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2 \
  && mkdir -p ${MAHOUT_HOME} \
  && mkdir -p ${SPARK_BASE} \
  && wget 
https://archive.apache.org/dist/spark/spark-${SPARK_MAJOR_MINOR}/spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR}.tgz
 \
  && tar -xvzf spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR}.tgz -C 
${SPARK_BASE}/ \
  && mv ${SPARK_BASE}/spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR} 
${SPARK_HOME} \
  && rm spark-${SPARK_MAJOR_MINOR}-bin-hadoop${HADOOP_MAJOR}.tgz \

### build mahout
RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \
cd ${MAHOUT_HOME} && \
sed -i '257d' ./bin/mahout && \
mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
-Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
-DskipTests -Dmaven.javadoc.skip=true clean package 
{code}

{code:bash}
bash-4.4# mahout spark-itemsimilarity
Adding lib/ to CLASSPATH
:/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at 
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scopt.OptionParser
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more
{code}





was (Author: renedlog):
Using Spark Prebuild it is just the scopt/OptionParser error:

{code:yaml}
FROM openjdk:8-alpine

ENV spark_uid=185
ENV SCALA_MAJOR=2.11
ENV SCALA_MAJOR_MINOR=2.11.12
ENV HADOOP_MAJOR=2.7
ENV SPARK_MAJOR_MINOR=2.4.5
ENV MAHOUT_MAJOR_MINOR=14.1
ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
ENV MAHOUT_BASE=/opt/mahout
ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
ENV SPARK_BASE=/opt/spark
ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
ENV ZINC_PORT=3030

### build spark
RUN apk add --no-cache curl bash openjdk8-jre python3 py-pip nss libc6-compat 
git unzip maven \
  && ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2 \
  && mkdir -p ${MAHOUT_HOME} \
  && mkdir 

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-02 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:55 AM:
-

What is really interesting... why do all test run successfully (without 
-DskipTests)? Looks like the tests seem different to the true environment. 

Here to build with main branch:
{code:yaml}
FROM openjdk:8-alpineENV spark_uid=185
ENV SCALA_MAJOR=2.11
ENV SCALA_MAJOR_MINOR=2.11.12
ENV HADOOP_MAJOR=2.7
ENV SPARK_MAJOR_MINOR=2.4.5
ENV MAHOUT_MAJOR_MINOR=14.1
ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
ENV MAHOUT_BASE=/opt/mahout
ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
ENV SPARK_BASE=/opt/spark
ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
ENV 
SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
ENV ZINC_PORT=3030

### build spark
RUN set -ex && \
apk upgrade --no-cache && \
ln -s /lib /lib64 && \
apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
krb5-libs nss curl openssl git maven && \
pip install setuptools && \
mkdir -p ${MAHOUT_HOME} && \
mkdir -p ${SPARK_BASE} && \
curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
rm ${SPARK_HOME}.tgz && \
export 
PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \
bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
--pip --tgz -DzincPort=${ZINC_PORT} \
-Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
-Phive-thriftserver -Pscala-${SCALA_MAJOR}
 
### build mahout
RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \
cd ${MAHOUT_HOME} && \
sed -i '257d' ./bin/mahout && \
mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
-Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
-DskipTests -Dmaven.javadoc.skip=true clean package
{code}
Please note the 
 *sed -i '257d' ./bin/mahout* 

This is a fix for an issue in the main branch causing an error.

In addition the scopt/OptionParser is throwing now an error:
{code:bash}
bash-4.4# ./bin/mahout spark-itemsimilarity
Adding lib/ to CLASSPATH
:/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at 
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scopt.OptionParser
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more
{code}
The error is dangling (once the above once this below) with another error:
{code:bash}
bash-4.4# mahout spark-itemsimilarity
Adding lib/ to CLASSPATH

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-02 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:44 AM:
-

What is really interesting... why do all test run successfully (without 
-DskipTests)? Looks like the tests seem different to the true environment. 

Here to build with main branch:
{code:yaml}
FROM openjdk:8-alpineENV spark_uid=185
ENV SCALA_MAJOR=2.11
ENV SCALA_MAJOR_MINOR=2.11.12
ENV HADOOP_MAJOR=2.7
ENV SPARK_MAJOR_MINOR=2.4.5
ENV MAHOUT_MAJOR_MINOR=14.1
ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
ENV MAHOUT_BASE=/opt/mahout
ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
ENV SPARK_BASE=/opt/spark
ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
ENV 
SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
ENV ZINC_PORT=3030

### build spark
RUN set -ex && \
apk upgrade --no-cache && \
ln -s /lib /lib64 && \
apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
krb5-libs nss curl openssl git maven && \
pip install setuptools && \
mkdir -p ${MAHOUT_HOME} && \
mkdir -p ${SPARK_BASE} && \
curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
rm ${SPARK_HOME}.tgz && \
export 
PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \
bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
--pip --tgz -DzincPort=${ZINC_PORT} \
-Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
-Phive-thriftserver -Pscala-${SCALA_MAJOR}
 
### build mahout
RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \
cd ${MAHOUT_HOME} && \
sed -i '257d' ./bin/mahout && \
mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
-Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
-DskipTests -Dmaven.javadoc.skip=true clean package
{code}
Please note the 
 *sed -i '257d' ./bin/mahout* 

This is a fix for an issue in the main branch causing an error.

In addition the scopt/OptionParser is throwing now an error:
{code:bash}
bash-4.4# ./bin/mahout spark-itemsimilarity
Adding lib/ to CLASSPATH
:/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at 
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scopt.OptionParser
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more
{code}


was (Author: renedlog):
What is really interesting... why do all test run successfully (without 
-DskipTests)? Looks like the tests seem different to the true environment. 

Here to build with main branch:
{code:yaml}
FROM 

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-02 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:43 AM:
-

What is really interesting... why do all test run successfully (without 
-DskipTests)? Looks like the tests seem different to the true environment. 

Here to build with main branch:
{code:yaml}
FROM openjdk:8-alpineENV spark_uid=185
ENV SCALA_MAJOR=2.11
ENV SCALA_MAJOR_MINOR=2.11.12
ENV HADOOP_MAJOR=2.7
ENV SPARK_MAJOR_MINOR=2.4.5
ENV MAHOUT_MAJOR_MINOR=14.1
ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
ENV MAHOUT_BASE=/opt/mahout
ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
ENV SPARK_BASE=/opt/spark
ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
ENV 
SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
ENV ZINC_PORT=3030

### build spark
RUN set -ex && \
apk upgrade --no-cache && \
ln -s /lib /lib64 && \
apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
krb5-libs nss curl openssl git maven && \
pip install setuptools && \
mkdir -p ${MAHOUT_HOME} && \
mkdir -p ${SPARK_BASE} && \
curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
rm ${SPARK_HOME}.tgz && \
export 
PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \
bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
--pip --tgz -DzincPort=${ZINC_PORT} \
-Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
-Phive-thriftserver -Pscala-${SCALA_MAJOR}
 
### build mahout
RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \
cd ${MAHOUT_HOME} && \
sed -i '257d' ./bin/mahout \
mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
-Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
-DskipTests -Dmaven.javadoc.skip=true clean package
{code}
Please note the 
 *sed -i '257d' ./bin/mahout* 

This is a fix for an issue in the main branch causing an error.

In addition the scopt/OptionParser is throwing now an error:
{code:bash}
bash-4.4# ./bin/mahout spark-itemsimilarity
Adding lib/ to CLASSPATH
:/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at 
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scopt.OptionParser
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more
{code}


was (Author: renedlog):
What is really interesting... why do all test run successfully (without 
-DskipTests)? Looks like the tests seem different to the true environment. 

Here to build with main branch:
{code:yaml}
FROM openjdk:8-alpineENV 

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-02 Thread Stefan Goldener (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048853#comment-17048853
 ] 

Stefan Goldener edited comment on MAHOUT-2093 at 3/2/20 8:00 AM:
-

What is really interesting... why do all test run successfully (without 
-DskipTests)? Looks like the tests seem different to the true environment. 

Here to build with main branch:
{code:yaml}
FROM openjdk:8-alpineENV spark_uid=185
ENV SCALA_MAJOR=2.11
ENV SCALA_MAJOR_MINOR=2.11.12
ENV HADOOP_MAJOR=2.7
ENV SPARK_MAJOR_MINOR=2.4.5
ENV MAHOUT_MAJOR_MINOR=14.1
ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
ENV MAHOUT_BASE=/opt/mahout
ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
ENV SPARK_BASE=/opt/spark
ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
ENV 
SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
ENV ZINC_PORT=3030

### build spark
RUN set -ex && \
apk upgrade --no-cache && \
ln -s /lib /lib64 && \
apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
krb5-libs nss curl openssl git maven && \
pip install setuptools && \
mkdir -p ${MAHOUT_HOME} && \
mkdir -p ${SPARK_BASE} && \
curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
rm ${SPARK_HOME}.tgz && \
export 
PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin && \
bash ${SPARK_HOME}/dev/change-scala-version.sh ${SCALA_MAJOR} && \
bash ${SPARK_HOME}/dev/make-distribution.sh --name ${DATE}-${REVISION} 
--pip --tgz -DzincPort=${ZINC_PORT} \
-Phadoop-${HADOOP_MAJOR} -Pkubernetes -Pkinesis-asl -Phive 
-Phive-thriftserver -Pscala-${SCALA_MAJOR}
 
### build mahout
RUN git clone https://github.com/apache/mahout.git ${MAHOUT_HOME} && \
cd ${MAHOUT_HOME} && \
sed -i '257d' ./bin/mahout \
mvn -Dspark.version=${SPARK_MAJOR_MINOR} 
-Dscala.version=${SCALA_MAJOR_MINOR} -Dscala.compat.version=${SCALA_MAJOR} 
-DskipTests -Dmaven.javadoc.skip=true clean package
{code}
Please note the 
 *sed -i '257d' ./bin/mahout* 

This is an issue in the main branch causing an error.

In addition the scopt/OptionParser is throwing now an error:
{code:bash}
bash-4.4# ./bin/mahout spark-itemsimilarity
Adding lib/ to CLASSPATH
:/opt/mahout/mahout-14.1/lib/mahout-core_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-hdfs_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark-cli-drivers_2.11-14.1-SNAPSHOT.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT-dependency-reduced.jar:/opt/mahout/mahout-14.1/lib/mahout-spark_2.11-14.1-SNAPSHOT.jar:/opt/spark/spark-2.4.5/jars/*::/opt/mahout/mahout-14.1/bin/mahout-spark-class.sh
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scopt/OptionParser
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at 
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scopt.OptionParser
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more
{code}


was (Author: renedlog):
What is really interesting... why do all test run successfully (without 
-DskipTests)? Looks like the tests seem different to the true environment. 

 

Here to build with main branch:
{code:yaml}
FROM openjdk:8-alpineENV 

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-01 Thread Andrew Palumbo (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048499#comment-17048499
 ] 

Andrew Palumbo edited comment on MAHOUT-2093 at 3/1/20 11:01 AM:
-

[[~renedlog]] This is an issue with the Scopt 3.3.0 CLI interface, when 
upgrading to scala 11.x, version 3.3.0 started giving us problems with i 
believe some conflicting transitive dependencies--  was somethingh along these 
lines.  We've upgraded in the current master for v14.1 to Scopt v3.7.1, which 
has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to 
{{/bin/mahout}}[1], and is a pass through to Spark's Scala shell [1], with the 
mahout spark specific and abstract {{.jars}} added, so it does not use the 
Scopt CLI drivers [1][2][3], which is why the shell works without issue in that 
release. 

0.14.0 is a huge refactor of the codebase, we are still moving Mahout-Hadoop 
MapReduce into the background, and we're still working out some of the kinks of 
this refactor in 14.1..

I would suggest the you try the last RC, but I believe there was a missing the 
distribution module, from the source distribution which broke the build, and 
which was the reason we scrapped it.

CLI drivers should be working in the current {{github/master}}: 
[https://github.com/apache/mahout.git] which is currently (mostly) stable.

Thanks for reporting it.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
 [2] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
 [3] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30]


was (Author: andrew_palumbo):
This is an issue with the Scopt 3.3.0 CLI interface.  We've upgraded in the 
current master for v14.1 to Scopt v3.7.1, which has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to 
`/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the 
mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why 
it works without issue in that release. 

0.14.1 is a huge refactor of the codebase, we're still working out some of the 
kinks in 14.1. 

I would suggest the last RC, but I believe there was a missing module, from the 
source distribution which was the reason we scrapped it.

CLI drivers should be working in the current {{github/master}}: 
[https://github.com/apache/mahout.git] which is currently (mostly) stable.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
 [2] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
 [3] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30]

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2
>Reporter: Stefan Goldener
>Priority: Blocker
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl 

[jira] [Comment Edited] (MAHOUT-2093) Mahout Source Broken

2020-03-01 Thread Andrew Palumbo (Jira)


[ 
https://issues.apache.org/jira/browse/MAHOUT-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048499#comment-17048499
 ] 

Andrew Palumbo edited comment on MAHOUT-2093 at 3/1/20 8:59 AM:


This is an issue with the Scopt 3.3.0 CLI interface.  We've upgraded in the 
current master for v14.1 to Scopt v3.7.1, which has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to 
`/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the 
mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why 
it works without issue in that release. 

0.14.1 is a huge refactor of the codebase, we're still working out some of the 
kinks in 14.1. 

I would suggest the last RC, but I believe there was a missing module, from the 
source distribution which was the reason we scrapped it.

CLI drivers should be working in the current {{github/master}}: 
[https://github.com/apache/mahout.git] which is currently (mostly) stable.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
 [2] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
 [3] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30]


was (Author: andrew_palumbo):
This is an issue with the Scopt 3.3.0 CLI interface.  We've upgraded in the 
current master for v14.1 to Scopt v3.7.1, which has solved the problem.

The Mahout Spark Shell is actually handled differently in the call to 
`/bin/mahout`, and is a pass through to Spark's Scala shell [1], with the 
mahout jars added, so it does not the Scopt CLI drivers [1][2][3], which is why 
it works without issue in that release.  

0.14.1 is a huge refactor of the codebase, we're still working out some of the 
kinks in 14.1.  

I would suggest the last RC, but I believe there was a missing module, from the 
source distribution which was the reason we scrapped it.  It should be working 
in `github/master`: [https://github.com/apache/mahout.git] which is currently 
(mostly) stable.

[1] [https://github.com/apache/mahout/blob/branch-0.14.0/bin/mahout#L299-L314]
[2] 
[https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/ItemSimilarityDriver.scala#L44]
[3] 
https://github.com/apache/mahout/blob/branch-0.14.0/community/spark-cli-drivers/src/main/scala/org/apache/mahout/drivers/MahoutOptionParser.scala#L30

> Mahout Source Broken
> 
>
> Key: MAHOUT-2093
> URL: https://issues.apache.org/jira/browse/MAHOUT-2093
> Project: Mahout
>  Issue Type: Bug
>  Components: Algorithms, Collaborative Filtering, Documentation
>Affects Versions: 0.14.0, 0.13.2
>Reporter: Stefan Goldener
>Priority: Blocker
>
> Seems like newer versions of Mahout do have problems with spark bindings e.g. 
> mahout spark-itemsimilarity or mahout spark-rowsimilarity do not work due to 
> class not found exceptions. 
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.RowSimilarityDriver
> {code}
> {code:java}
> Error: Could not find or load main class 
> org.apache.mahout.drivers.ItemSimilarityDriver
> {code}
> whereas *mahout spark-shell* works flawlessly.
> Here is a short Dockerfile to show the issue:
> {code:yaml}
> FROM openjdk:8-alpine
> ENV spark_uid=185
> ENV SCALA_MAJOR=2.11
> ENV SCALA_MAJOR_MINOR=2.11.12
> ENV HADOOP_MAJOR=2.7
> ENV SPARK_MAJOR_MINOR=2.4.5
> ENV MAHOUT_MAJOR_MINOR=0.14.0
> ENV MAHOUT_VERSION=mahout-${MAHOUT_MAJOR_MINOR}
> ENV MAHOUT_BASE=/opt/mahout
> ENV MAHOUT_HOME=${MAHOUT_BASE}/${MAHOUT_VERSION}
> ENV SPARK_VERSION=spark-${SPARK_MAJOR_MINOR}
> ENV SPARK_BASE=/opt/spark
> ENV SPARK_HOME=${SPARK_BASE}/${SPARK_VERSION}
> ENV MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
> ENV 
> SPARK_SRC_URL="https://archive.apache.org/dist/spark/${SPARK_VERSION}/${SPARK_VERSION}.tgz;
> ENV 
> MAHOUT_SRC_URL="https://archive.apache.org/dist/mahout/${MAHOUT_MAJOR_MINOR}/mahout-${MAHOUT_MAJOR_MINOR}-source-release.zip;
> ENV ZINC_PORT=3030
> ### build spark
> RUN set -ex && \
> apk upgrade --no-cache && \
> ln -s /lib /lib64 && \
> apk add --no-cache bash python py-pip tini libc6-compat linux-pam krb5 
> krb5-libs nss curl openssl git maven && \
> pip install setuptools && \
> mkdir -p ${MAHOUT_HOME} && \
> mkdir -p ${SPARK_BASE} && \
> curl  -LfsS ${SPARK_SRC_URL} -o ${SPARK_HOME}.tgz  && \
> tar -xzvf ${SPARK_HOME}.tgz -C ${SPARK_BASE}/ && \
> rm ${SPARK_HOME}.tgz && \
> export 
> PATH=$PATH:$MAHOUT_HOME/bin:$MAHOUT_HOME/lib:$SPARK_HOME/bin:$JAVA_HOME/bin 
> && \
> bash ${SPARK_HOME}/dev/change-scala-version.sh