Fwiw if you do decide to handle language detection on your machine this
library works great on tweets https://github.com/carrotsearch/langid-java
On Tue, Nov 11, 2014, 7:52 PM Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Nov 12, 2014 at 5:42 AM, SK skrishna...@gmail.com wrote:
But
Just ran into this today myself. I'm on branch-1.0 using a CDH3
cluster (no modifications to Spark or its dependencies). The error
appeared trying to run GraphX's .connectedComponents() on a ~200GB
edge list (GraphX worked beautifully on smaller data).
Here's the stacktrace (it's quite similar to
/bidirectional-network-current/part-r-1'
USING PigStorage() AS (id1:long, id2:long, weight:int);
ttt = LIMIT edgeList0 10;
DUMP ttt;
On Wed, May 28, 2014 at 12:55 PM, Ryan Compton compton.r...@gmail.com wrote:
It appears to be Spark 1.0 related. I made a pom.xml with a single
dependency on Spark
posted a JIRA https://issues.apache.org/jira/browse/SPARK-1952
On Wed, May 28, 2014 at 1:14 PM, Ryan Compton compton.r...@gmail.com wrote:
Remark, just including the jar built by sbt will produce the same
error. i,.e this pig script will fail:
REGISTER
/usr/share/osi1/spark-1.0.0/assembly
I use both Pig and Spark. All my code is built with Maven into a giant
*-jar-with-dependencies.jar. I recently upgraded to Spark 1.0 and now
all my pig scripts fail with:
Caused by: java.lang.RuntimeException: Could not resolve error that
occured when launching map reduce job:
, Ryan Compton compton.r...@gmail.com
wrote:
I'm trying shoehorn a label propagation-ish algorithm into GraphX. I
need to update each vertex with the median value of their neighbors.
Unlike PageRank, which updates each vertex with the mean of their
neighbors, I don't have a simple commutative
I am trying to read an edge list into a Graph. My data looks like
394365859 -- 136153151
589404147 -- 1361045425
I read it into a Graph via:
val edgeFullStrRDD: RDD[String] = sc.textFile(unidirFName)
val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t))
.map(x
Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/.txt
This code:
println(g.numEdges)
println(g.numVertices)
println(g.edges.distinct().count())
gave me
1
9294
2
On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave ankurd...@gmail.com wrote:
I wasn't able to reproduce this
Does this continue in newer versions? (I'm on 0.8.0 now)
When I use .distinct() on moderately large datasets (224GB, 8.5B rows,
I'm guessing about 500M are distinct) my jobs fail with:
14/04/17 15:04:02 INFO cluster.ClusterTaskSetManager: Loss was due to
java.io.FileNotFoundException
Btw, I've got System.setProperty(spark.shuffle.consolidate.files,
true) and use ext3 (CentOS...)
On Thu, Apr 17, 2014 at 3:20 PM, Ryan Compton compton.r...@gmail.com wrote:
Does this continue in newer versions? (I'm on 0.8.0 now)
When I use .distinct() on moderately large datasets (224GB, 8.5B
No idea how feasible this is. Has anyone done it?
To clarify: I don't need the actual paths, just the distances.
On Wed, Mar 26, 2014 at 3:04 PM, Ryan Compton compton.r...@gmail.com wrote:
No idea how feasible this is. Has anyone done it?
12 matches
Mail list logo