he bug fix
https://github.com/apache/incubator-spark/pull/612
Best Regards,
Jiacheng Guo
On Mon, Jan 27, 2014 at 2:36 PM, guojc wrote:
> Hi Patrick,
> I have create the jira
> https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the
> situation is related to j
Hi Patrick,
I have create the jira
https://spark-project.atlassian.net/browse/SPARK-1045. It turn out the
situation is related to join two large rdd, not related to the combine
process as previous thought.
Best Regards,
Jiacheng Guo
On Mon, Jan 27, 2014 at 11:07 AM, guojc wrote:
>
the
> final combined output *for a given key* in memory. If you are
> outputting GB of data for a single key, then you might also look into
> a different parallelization strategy for your algorithm. Not sure if
> this is also an issue though...
>
> - Patrick
>
> On Sun,
t the cause.
Thanks,
Jiacheng Guo
On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell wrote:
> This code has been modified since you reported this so you may want to
> try the current master.
>
> - Patrick
>
> On Mon, Jan 20, 2014 at 4:22 AM, guojc wrote:
> > Hi,
> >
ementation, but if data not need to join
> together, you'd better keep them in workers.
>
>
> 2014/1/24 guojc
>
>> Hi,
>>I'm writing a paralell mcmc program that having a very large dataset
>> in memory, and need to update the dataset in-memory and avoid cre
Hi,
I'm writing a paralell mcmc program that having a very large dataset in
memory, and need to update the dataset in-memory and avoid creating
additional copy. Should I choose a foreach operation on rdd to express the
change? or I have to create a new rdd after each sampling process?
Thanks,
J
Hi,
I'm tring out lastest master branch of spark for the exciting external
hashmap feature. I have a code that is running correctly at spark 0.8.1 and
I only make a change for its easily to be spilled to disk. However, I
encounter a few task failure of
java.util.NoSuchElementException (java.util.
taging/application_1384874528558_0003/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.5-alpha.jar
>
> Tom
>
>
> On Tuesday, November 19, 2013 5:35 AM, guojc wrote:
> Hi Tom,
>Thank you for your response. I have double checked that I had upload
> both jar in the same f
that and make sure to
> put hdfs:// on them when you export SPARK_JAR and specify the --jar option.
>
>
> I'll try to reproduce the error tomorrow to see if a bug was introduced
> when I added the feature to run spark from HDFS.
>
> Tom
>
>
> On Monday, November 18, 20
PARK_EXAMPLES_JAR env variable.
>
> You should only have to set SPARK_JAR env variable.
>
> If that isn't the issue let me know the build command you used and hadoop
> version, and your defaultFs or hadoop.
>
> Tom
>
>
> On Saturday, November 16, 2013 2:32 AM, guoj
ith using the Shark layer above Spark (and I think
> for many use cases the answer would be "yes"), then you can take advantage
> of Shark's co-partitioning. Or do something like
> https://github.com/amplab/shark/pull/100/commits
>
> Sent while mobile. Pls excuse typos etc
hi,
After reading about the exiting progress in consolidating shuffle, I'm
eager to trying out the last master branch. However up to launch the
example application, the job failed with prompt the app master failed to
find the target jar. appDiagnostics: Application
application_1384588058297_0017
/researcher/files/us-ytian/hadoopjoin.pdf for
PerSplit SemiJoin's details.
Best Regards,
Jiacheng Guo
On Sat, Nov 16, 2013 at 3:02 AM, Meisam Fathi wrote:
> Hi guojc,
>
> It is not cleat for me what problem you are trying to solve. What do
> you want to do with the resu
Hi,
How can I override the default java.io.tmpdir and spark.local.dir in
YARN. I had tried to set SPARK_YARN_USER_ENV with SPARK_JAVA_OPTS. It seems
has no effect. The position is still from
YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR, and it is a very small disk
for me. Any suggestion?
Thanks
if the
> default partitioner does not suit your purpose.
> You can take a look at this
>
> http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf
> .
>
> Thanks,
> Meisam
>
> On Fri, Nov 15, 2013 at 6:54 AM, guojc wrote:
> >
Hi,
I'm wondering whether spark rdd can has a partitionedByKey function? The
use of this function is to have a rdd distributed by according to a
cerntain paritioner and cache it. And then further join performance by rdd
with same partitoner will a great speed up. Currently, we only have a
groupBy
Hi,
17 matches
Mail list logo