Re: ExternalAppendOnlyMap throw no such element

2014-02-18 Thread guojc
he bug fix https://github.com/apache/incubator-spark/pull/612 Best Regards, Jiacheng Guo On Mon, Jan 27, 2014 at 2:36 PM, guojc wrote: > Hi Patrick, > I have create the jira > https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the > situation is related to j

Re: ExternalAppendOnlyMap throw no such element

2014-01-26 Thread guojc
Hi Patrick, I have create the jira https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the situation is related to join two large rdd, not related to the combine process as previous thought. Best Regards, Jiacheng Guo On Mon, Jan 27, 2014 at 11:07 AM, guojc wrote: >

Re: ExternalAppendOnlyMap throw no such element

2014-01-26 Thread guojc
the > final combined output *for a given key* in memory. If you are > outputting GB of data for a single key, then you might also look into > a different parallelization strategy for your algorithm. Not sure if > this is also an issue though... > > - Patrick > > On Sun,

Re: ExternalAppendOnlyMap throw no such element

2014-01-26 Thread guojc
t the cause. Thanks, Jiacheng Guo On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell wrote: > This code has been modified since you reported this so you may want to > try the current master. > > - Patrick > > On Mon, Jan 20, 2014 at 4:22 AM, guojc wrote: > > Hi, > >

Re: Does foreach operation increase rdd lineage?

2014-01-24 Thread guojc
ementation, but if data not need to join > together, you'd better keep them in workers. > > > 2014/1/24 guojc > >> Hi, >>I'm writing a paralell mcmc program that having a very large dataset >> in memory, and need to update the dataset in-memory and avoid cre

Does foreach operation increase rdd lineage?

2014-01-24 Thread guojc
Hi, I'm writing a paralell mcmc program that having a very large dataset in memory, and need to update the dataset in-memory and avoid creating additional copy. Should I choose a foreach operation on rdd to express the change? or I have to create a new rdd after each sampling process? Thanks, J

ExternalAppendOnlyMap throw no such element

2014-01-20 Thread guojc
Hi, I'm tring out lastest master branch of spark for the exciting external hashmap feature. I have a code that is running correctly at spark 0.8.1 and I only make a change for its easily to be spilled to disk. However, I encounter a few task failure of java.util.NoSuchElementException (java.util.

Re: App master failed to find application jar in the master branch on YARN

2013-11-19 Thread guojc
taging/application_1384874528558_0003/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar > > Tom > > > On Tuesday, November 19, 2013 5:35 AM, guojc wrote: > Hi Tom, >Thank you for your response. I have double checked that I had upload > both jar in the same f

Re: App master failed to find application jar in the master branch on YARN

2013-11-19 Thread guojc
that and make sure to > put hdfs:// on them when you export SPARK_JAR and specify the --jar option. > > > I'll try to reproduce the error tomorrow to see if a bug was introduced > when I added the feature to run spark from HDFS. > > Tom > > > On Monday, November 18, 20

Re: App master failed to find application jar in the master branch on YARN

2013-11-18 Thread guojc
PARK_EXAMPLES_JAR env variable. > > You should only have to set SPARK_JAR env variable. > > If that isn't the issue let me know the build command you used and hadoop > version, and your defaultFs or hadoop. > > Tom > > > On Saturday, November 16, 2013 2:32 AM, guoj

Re: Does spark RDD has a partitionedByKey

2013-11-16 Thread guojc
ith using the Shark layer above Spark (and I think > for many use cases the answer would be "yes"), then you can take advantage > of Shark's co-partitioning. Or do something like > https://github.com/amplab/shark/pull/100/commits > > Sent while mobile. Pls excuse typos etc

App master failed to find application jar in the master branch on YARN

2013-11-16 Thread guojc
hi, After reading about the exiting progress in consolidating shuffle, I'm eager to trying out the last master branch. However up to launch the example application, the job failed with prompt the app master failed to find the target jar. appDiagnostics: Application application_1384588058297_0017

Re: Does spark RDD has a partitionedByKey

2013-11-15 Thread guojc
/researcher/files/us-ytian/hadoopjoin.pdf for PerSplit SemiJoin's details. Best Regards, Jiacheng Guo On Sat, Nov 16, 2013 at 3:02 AM, Meisam Fathi wrote: > Hi guojc, > > It is not cleat for me what problem you are trying to solve. What do > you want to do with the resu

How to override yarn default java.io.tmpdir and spark.local.dir

2013-11-15 Thread guojc
Hi, How can I override the default java.io.tmpdir and spark.local.dir in YARN. I had tried to set SPARK_YARN_USER_ENV with SPARK_JAVA_OPTS. It seems has no effect. The position is still from YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR, and it is a very small disk for me. Any suggestion? Thanks

Re: Does spark RDD has a partitionedByKey

2013-11-15 Thread guojc
if the > default partitioner does not suit your purpose. > You can take a look at this > > http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf > . > > Thanks, > Meisam > > On Fri, Nov 15, 2013 at 6:54 AM, guojc wrote: > >

Does spark RDD has a partitionedByKey

2013-11-15 Thread guojc
Hi, I'm wondering whether spark rdd can has a partitionedByKey function? The use of this function is to have a rdd distributed by according to a cerntain paritioner and cache it. And then further join performance by rdd with same partitoner will a great speed up. Currently, we only have a groupBy

Does Spark has a partitionByKey function

2013-11-15 Thread guojc
Hi,