Re: Joining data using Latitude, Longitude

2015-03-12 Thread Andrew Musselman
Ted Dunning and Ellen Friedman's "Time Series Databases" has a section on this with some approaches to geo-encoding: https://www.mapr.com/time-series-databases-new-ways-store-and-access-data http://info.mapr.com/rs/mapr/images/Time_Series_Databases.pdf On Tue, Mar 10, 2015 at 3:53 PM, John Meehan

Build error

2015-01-30 Thread Andrew Musselman
Off master, got this error; is that typical? --- T E S T S --- Running org.apache.spark.streaming.mqtt.JavaMQTTStreamSuite Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.495

Re: Row similarities

2015-01-17 Thread Andrew Musselman
ferent similarity methods (long since implemented in hadoop mapreduce) >> hasn’t been moved to spark yet. >> >> Yep, rows are not covered in the blog, my mistake. Too bad it has a lot of >> uses and can at very least be optimized for output matrix symmetry. >> >

Re: Row similarities

2015-01-17 Thread Andrew Musselman
ed in the >> Mahout code tied to LLR. Seems like we should get these together. >> >> On Jan 17, 2015, at 9:37 AM, Andrew Musselman >> wrote: >> >> Excellent, thanks Pat. >> >> On Jan 17, 2015, at 9:27 AM, Pat Ferrel wrote: >> >>&g

Re: Row similarities

2015-01-17 Thread Andrew Musselman
the downsampling is done as LLR is calculated, so > the entire similarity matrix is never actually calculated unless you disable > downsampling. > > The primary use is for recommenders but I’ve used it (in the test suite) for > row-wise text token similarity too. > &g

Re: Row similarities

2015-01-17 Thread Andrew Musselman
off using Mahout's RowSimilarityJob for what u r > trying to accomplish. > > 1. It does give u pair-wise distances > 2. U can specify the Distance measure u r looking to use > 3. There's the old MapReduce impl and the Spark DSL impl per ur preference. > > Fro

Re: Row similarities

2015-01-17 Thread Andrew Musselman
to rows that are similar to one another. > >> On Fri, Jan 16, 2015 at 5:18 PM, Andrew Musselman >> wrote: >> What's a good way to calculate similarities between all vector-rows in a >> matrix or RDD[Vector]? >> >> I'm seeing RowMatrix has a co

Re: Maven out of memory error

2015-01-17 Thread Andrew Musselman
I'll try to uncover more later this weekend. Thoughts welcome though. > > On Fri, Jan 16, 2015 at 8:26 PM, Andrew Musselman > wrote: >> Thanks Ted, got farther along but now have a failing test; is this a known >

Row similarities

2015-01-16 Thread Andrew Musselman
What's a good way to calculate similarities between all vector-rows in a matrix or RDD[Vector]? I'm seeing RowMatrix has a columnSimilarities method but I'm not sure I'm going down a good path to transpose a matrix in order to run that.

Re: Maven out of memory error

2015-01-16 Thread Andrew Musselman
u are probably building with newer Hadoop > profiles and so old-Hadoop support code shows deprecation warnings on > its use of old APIs. > > On Fri, Jan 16, 2015 at 8:03 PM, Andrew Musselman > wrote: > > Just got the latest from Github and tried running `mvn test`; is this

Re: Maven out of memory error

2015-01-16 Thread Andrew Musselman
Job aborted due to stage failure: Maste... On Fri, Jan 16, 2015 at 12:06 PM, Ted Yu wrote: > Can you try doing this before running mvn ? > > export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M > -XX:ReservedCodeCacheSize=512m" > > What OS are you using ? > > Cheers &g

Maven out of memory error

2015-01-16 Thread Andrew Musselman
Just got the latest from Github and tried running `mvn test`; is this error common and do you have any advice on fixing it? Thanks! [INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-core_2.10 --- [WARNING] Zinc server is not available at port 3030 - reverting to normal inc

Subscribe

2015-01-16 Thread Andrew Musselman