Re: [VOTE] Designating maintainers for some Spark components

2014-11-11 Thread Yu Ishikawa
+1 (binding) On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia [hidden email] wrote: BTW, my own vote is obviously +1 (binding). Matei On Nov 5, 2014, at 5:31 PM, Matei Zaharia [hidden email] wrote: Hi all, I wanted to share a discussion we've been having on the PMC list,

Re: JIRA + PR backlog

2014-11-11 Thread Yu Ishikawa
Great jobs! I didn't know Spark PR Dashboard. Thanks Yu Ishikawa - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/JIRA-PR-backlog-tp9157p9282.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

RE: Bind exception while running FlumeEventCount

2014-11-11 Thread Jeniba Johnson
Hi Hari Yes I started Flume agent to push data to the relevant port. Below mentioned are the conf files for flume configurations Test21.conf # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro

Terasort example

2014-11-11 Thread Ewan Higgs
Hi all, I saw that Reynold Xin had a Terasort example PR on Github[1]. It didn't appear to be similar to the Hadoop Terasort example, so I've tried to brush it into shape so it can generate Terasort files (teragen), sort the files (terasort) and validate the files (teravalidate). My branch is

Re: thrift jdbc server probably running queries as hive query

2014-11-11 Thread Sadhan Sood
Hi Cheng, I made sure the only hive server running on the machine is hivethriftserver2. /usr/lib/jvm/default-java/bin/java -cp /usr/lib/hadoop/lib/hadoop-lzo.jar::/mnt/sadhan/spark-3/sbin/../conf:/mnt/sadhan/spark-3/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.2.jar:/etc/hadoop/conf -Xms512m

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-11 Thread Ashutosh
Hi Mayur, Vector data types are implemented using breeze library, it is presented at .../org/apache/spark/mllib/linalg Anant, One restriction I found that a vector can only be of 'Double', so it actually restrict the user. What are you thoughts on LibSVM format? Thanks for the comments, I

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-11 Thread slcclimber
Mayur, Libsvm format sounds good to me. I could work on writing the tests if that helps you? Anant On Nov 11, 2014 11:06 AM, Ashutosh [via Apache Spark Developers List] ml-node+s1001551n9286...@n3.nabble.com wrote: Hi Mayur, Vector data types are implemented using breeze library, it is

Re: MLlib related query

2014-11-11 Thread Xiangrui Meng
Searched MLlib on Google Scholar and didn't find any:) MLlib implements well-recognized algorithms. Each of which may correspond to a paper or serveral papers. Please find the reference in the code if you are interested. -Xiangrui On Sat, Nov 8, 2014 at 1:37 AM, Manu Kaul manohar.k...@gmail.com

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-11 Thread Ashutosh
sure you are welcome. Let me fix the issues you have pointed out. I'll update you soon by this weekend. _Ashutosh From: slcclimber [via Apache Spark Developers List] ml-node+s1001551n9287...@n3.nabble.com Sent: Tuesday, November 11, 2014 11:46 PM To: Ashutosh

Re: Terasort example

2014-11-11 Thread Reynold Xin
This is great. I think the consensus from last time was that we would put performance stuff into spark-perf, so it is easy to test different Spark versions. On Tue, Nov 11, 2014 at 5:03 AM, Ewan Higgs ewan.hi...@ugent.be wrote: Hi all, I saw that Reynold Xin had a Terasort example PR on

Re: JIRA + PR backlog

2014-11-11 Thread Nicholas Chammas
Yeah, kudos to Josh for putting that together. On Tue, Nov 11, 2014 at 3:26 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Great jobs! I didn't know Spark PR Dashboard. Thanks Yu Ishikawa - -- Yu Ishikawa -- View this message in context:

Re: Terasort example

2014-11-11 Thread Ewan Higgs
Shall I move the code to spark-perf then and submit a PR? Or shall I submit a PR to spark where it can remain an idiomatic example and we can clone it in spark-perf where it can potentially evolve non-idiomatic optimizations? Yours, Ewan On 11/11/2014 07:58 PM, Reynold Xin wrote: This is

Re: JIRA + PR backlog

2014-11-11 Thread Patrick Wendell
I wonder if we should be linking to that dashboard somewhere from our official docs or the wiki... On Tue, Nov 11, 2014 at 12:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, kudos to Josh for putting that together. On Tue, Nov 11, 2014 at 3:26 AM, Yu Ishikawa

Re: JIRA + PR backlog

2014-11-11 Thread Nicholas Chammas
That's a good idea. To encourage and leverage the community to review PRs and JIRA issues, we should link to both the PR dashboard https://spark-prs.appspot.com/ and the stale JIRA filter https://issues.apache.org/jira/browse/SPARK-560?filter=12329614 (or whatever JIRA filter or dashboard we

Re: Terasort example

2014-11-11 Thread Josh Rosen
For now, I’d recommend opening a PR against spark-perf.  It would be great to try to integrate this into the spark-perf harness so that I can run it automatically as part of Spark 1.2.0 release testing.  If you open a rough WIP PR over there, I’ll be able to provide some feedback to help you

Re: JIRA + PR backlog

2014-11-11 Thread Josh Rosen
If possible, it would be great if we could set up a DNS alias, such as prs.spark.apache.org, to point to this app.  I think the appspot domain is being blocked by some users’ firewalls, preventing them from accessing the site. On November 11, 2014 at 1:34:36 PM, Nicholas Chammas

Partition caching taking too long

2014-11-11 Thread Sadhan Sood
While testing SparkSQL on top of our Hive metastore, we were trying to cache the data for one partition of the table in memory like this: CACHE TABLE xyz_20141029 AS SELECT * FROM xyz where date_prefix = 20141029 Table xyz is a hive table which is partitioned with date_prefix. The data is

Re: thrift jdbc server probably running queries as hive query

2014-11-11 Thread Cheng Lian
Hey Sadhan, Sorry for my previous abrupt reply. Submitting a MR job is definitely wrong here, I'm investigating. Would you mind to provide the Spark/Hive/Hadoop versions you are using? If you're using most recent master branch, a concrete commit sha1 would be very helpful. Thanks! Cheng

[NOTICE] [BUILD] Minor changes to Spark's build

2014-11-11 Thread Patrick Wendell
Hey All, I've just merged a patch that adds support for Scala 2.11 which will have some minor implications for the build. These are due to the complexities of supporting two versions of Scala in a single project. 1. The JDBC server will now require a special flag to build -Phive-thriftserver on

Spark-Submit issues

2014-11-11 Thread Jeniba Johnson
Hi Hari, Now Iam trying out the same FlumeEventCount example running with spark-submit Instead of run example. The steps I followed is that I have exported the JavaFlumeEventCount.java into jar. The command used is ./bin/spark-submit --jars lib/spark-examples-1.1.0-hadoop1.0.4.jar --master