Running this now ./make-distribution.sh --tgz -Phadoop-2.4 -Pyarn -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
Waiting for it to complete. There is no progress after initial log messages //LOGS $ ./make-distribution.sh --tgz -Phadoop-2.4 -Pyarn -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package +++ dirname ./make-distribution.sh ++ cd . ++ pwd + SPARK_HOME=/Users/dvasthimal/ebay/projects/ep/spark-1.4.0 + DISTDIR=/Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist + SPARK_TACHYON=false + TACHYON_VERSION=0.6.4 + TACHYON_TGZ=tachyon-0.6.4-bin.tar.gz + TACHYON_URL= https://github.com/amplab/tachyon/releases/download/v0.6.4/tachyon-0.6.4-bin.tar.gz + MAKE_TGZ=false + NAME=none + MVN=/Users/dvasthimal/ebay/projects/ep/spark-1.4.0/build/mvn + (( 9 )) + case $1 in + MAKE_TGZ=true + shift + (( 8 )) + case $1 in + break + '[' -z /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/ ']' + '[' -z /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/ ']' ++ command -v git + '[' /usr/bin/git ']' ++ git rev-parse --short HEAD ++ : + GITREV= + '[' '!' -z '' ']' + unset GITREV ++ command -v /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/build/mvn + '[' '!' /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/build/mvn ']' ++ /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/build/mvn help:evaluate -Dexpression=project.version -Phadoop-2.4 -Pyarn -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package ++ grep -v INFO ++ tail -n 1 //LOGS On Sun, Jun 28, 2015 at 12:17 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > I just did that, where can i find that "spark-1.4.0-bin-hadoop2.4.tgz" > file ? > > On Sun, Jun 28, 2015 at 12:15 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> You can use the following command to build Spark after applying the pull >> request: >> >> mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package >> >> >> Cheers >> >> >> On Sun, Jun 28, 2015 at 11:43 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >> wrote: >> >>> I see that block support did not make it to spark 1.4 release. >>> >>> Can you share instructions of building spark with this support for >>> hadoop 2.4.x distribution. >>> >>> appreciate. >>> >>> On Fri, Jun 26, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>> wrote: >>> >>>> This is nice. Which version of Spark has this support ? Or do I need to >>>> build it. >>>> I have never built Spark from git, please share instructions for Hadoop >>>> 2.4.x YARN. >>>> >>>> I am struggling a lot to get a join work between 200G and 2TB datasets. >>>> I am constantly getting this exception >>>> >>>> 1000s of executors are failing with >>>> >>>> 15/06/26 13:05:28 ERROR storage.ShuffleBlockFetcherIterator: Failed to >>>> get block(s) from phxdpehdc9dn2125.stratus.phx.ebay.com:60162 >>>> java.io.IOException: Failed to connect to >>>> executor_host_name/executor_ip_address:60162 >>>> at >>>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191) >>>> at >>>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) >>>> at >>>> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78) >>>> at >>>> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) >>>> at >>>> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43) >>>> at >>>> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> >>>> >>>> >>>> On Fri, Jun 26, 2015 at 3:20 PM, Koert Kuipers <ko...@tresata.com> >>>> wrote: >>>> >>>>> we went through a similar process, switching from scalding (where >>>>> everything just works on large datasets) to spark (where it does not). >>>>> >>>>> spark can be made to work on very large datasets, it just requires a >>>>> little more effort. pay attention to your storage levels (should be >>>>> memory-and-disk or disk-only), number of partitions (should be large, >>>>> multiple of num executors), and avoid groupByKey >>>>> >>>>> also see: >>>>> https://github.com/tresata/spark-sorted (for avoiding in memory >>>>> operations for certain type of reduce operations) >>>>> https://github.com/apache/spark/pull/6883 (for blockjoin) >>>>> >>>>> >>>>> On Fri, Jun 26, 2015 at 5:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>> wrote: >>>>> >>>>>> Not far at all. On large data sets everything simply fails with >>>>>> Spark. Worst is am not able to figure out the reason of failure, the >>>>>> logs >>>>>> run into millions of lines and i do not know the keywords to search for >>>>>> failure reason >>>>>> >>>>>> On Mon, Jun 15, 2015 at 6:52 AM, Night Wolf <nightwolf...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> How far did you get? >>>>>>> >>>>>>> On Tue, Jun 2, 2015 at 4:02 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> We use Scoobi + MR to perform joins and we particularly use >>>>>>>> blockJoin() API of scoobi >>>>>>>> >>>>>>>> >>>>>>>> /** Perform an equijoin with another distributed list where this >>>>>>>> list is considerably smaller >>>>>>>> * than the right (but too large to fit in memory), and where the >>>>>>>> keys of right may be >>>>>>>> * particularly skewed. */ >>>>>>>> >>>>>>>> def blockJoin[B : WireFormat](right: DList[(K, B)]): DList[(K, (A, >>>>>>>> B))] = >>>>>>>> Relational.blockJoin(left, right) >>>>>>>> >>>>>>>> >>>>>>>> I am trying to do a POC and what Spark join API(s) is recommended >>>>>>>> to achieve something similar ? >>>>>>>> >>>>>>>> Please suggest. >>>>>>>> >>>>>>>> -- >>>>>>>> Deepak >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >>> >>> >>> -- >>> Deepak >>> >>> >> > > > -- > Deepak > > -- Deepak