Just add more information how I build the custom distribution. I clone spark repo then switch to branch 2.2 then make distribution that following.
λ ~/workspace/big_data/spark/ branch-2.2* λ ~/workspace/big_data/spark/ ./dev/make-distribution.sh --name custom --tgz -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive -Phive-thriftserver -Pmesos -Pyarn On Mon, Jun 12, 2017 at 6:14 PM Chanh Le <giaosu...@gmail.com> wrote: > Hi everyone, > > Recently I discovered an issue when processing csv of spark. So I decided > to fix it following this https://issues.apache.org/jira/browse/SPARK-21024 I > built a custom distribution for internal uses. I built it in my local > machine then upload the distribution to server. > > server's *~/.bashrc* > > # added by Anaconda2 4.3.1 installer > export PATH="/opt/etl/anaconda/anaconda2/bin:$PATH" > export SPARK_HOME="/opt/etl/spark-2.1.0-bin-hadoop2.7" > export > PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH > > What I did on server was: > export SPARK_HOME=/home/etladmin/spark-2.2.1-SNAPSHOT-bin-custom > > $SPARK_HOME/bin/spark-submit --version > It print out version *2.1.1* which* is not* the version I built (2.2.1) > > > I did set *SPARK_HOME* in my local machine (MACOS) for this distribution > and it's working well, print out the version *2.2.1* > > I need the way to investigate the invisible environment variable. > > Do you have any suggestions? > Thank in advance. > > Regards, > Chanh > > -- > Regards, > Chanh > -- Regards, Chanh