Re: [BUILD FAILURE] Spark Project ML Local Library - me or it's real?

2016-04-09 Thread Ted Yu
The broken build was caused by the following: [SPARK-14462][ML][MLLIB] add the mllib-local build to maven pom See https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/607/ FYI On Sat, Apr 9, 2016 at 12:01 PM, Jacek Laskowski wrote: > Hi, > > Is this me or the build is

Re: [BUILD FAILURE] Spark Project ML Local Library - me or it's real?

2016-04-09 Thread Ted Yu
Sent PR: https://github.com/apache/spark/pull/12276 I was able to get build going past mllib-local module. FYI On Sat, Apr 9, 2016 at 12:40 PM, Ted Yu wrote: > The broken build was caused by the following: > > [SPARK-14462][ML][MLLIB] add the mllib-local build to maven pom > &

Re: spark graphx storage RDD memory leak

2016-04-10 Thread Ted Yu
I see the following code toward the end of the method: // Unpersist the RDDs hidden by newly-materialized RDDs oldMessages.unpersist(blocking = false) prevG.unpersistVertices(blocking = false) prevG.edges.unpersist(blocking = false) Wouldn't the above achieve same effect ?

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-11 Thread Ted Yu
Gentle ping: spark-1.6.1-bin-hadoop2.4.tgz from S3 is still corrupt. On Wed, Apr 6, 2016 at 12:55 PM, Josh Rosen wrote: > Sure, I'll take a look. Planning to do full verification in a bit. > > On Wed, Apr 6, 2016 at 12:54 PM Ted Yu wrote: > >> Josh: >> Can you ch

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
I assume you tested 2.0 with SPARK-12181 . Related code from Platform.java if java.nio.Bits#unaligned() throws exception: // We at least know x86 and x64 support unaligned access. String arch = System.getProperty("os.arch", ""); //noinspection DynamicRegexReplaceableByCompiledPa

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
unaligned);* > } > } > > Output is, as you'd expect, "used reflection and _unaligned is false, > setting to true anyway for experimenting", and the tests pass. > > No other problems on the platform (pending a different pull request). > > Cheers, > >

Re: BytesToBytes and unaligned memory

2016-04-15 Thread Ted Yu
added > > Cheers, > > > > > From:Ted Yu > To:Adam Roberts/UK/IBM@IBMGB > Cc:"dev@spark.apache.org" > Date:15/04/2016 16:43 > Subject:Re: BytesToBytes and unaligned memory > -- > &g

Re: BytesToBytes and unaligned memory

2016-04-18 Thread Ted Yu
unaligned memory access on a platform where unaligned memory access is > definitely not supported for shorts/ints/longs. > > if these tests continue to pass then I think the Spark tests don't > exercise unaligned memory access, cheers > > > > > > > > From:

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
I had one PR which got merged after 3 months. If the inactivity was due to contributor, I think it can be closed after 30 days. But if the inactivity was due to lack of review, the PR should be kept open. On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger wrote: > For what it's worth, I have defi

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
filtered for non-mergeable PRs or instead left a comment > asking the author to respond if they are still available to move the PR > forward - and close the ones where they don't respond for a week? > > Just a suggestion. > On Monday, April 18, 2016, Ted Yu wrote: > >>

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
en but the cost to reopen is approximately zero (i.e. > click a button on the pull request). > > > On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu wrote: > >> bq. close the ones where they don't respond for a week >> >> Does this imply that the script understands re

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
ing people look at the pull requests that have been inactive >>> for a >>> > long time. That seems equally likely (or unlikely) as committers >>> looking at >>> > the recently closed pull requests. >>> > >>> > In either case, most

Re: auto closing pull requests that have been inactive > 30 days?

2016-04-18 Thread Ted Yu
s; a higher barrier to contributing; a combination >> thereof; etc... >> >> Also relevant: http://danluu.com/discourage-oss/ >> >> By the way, some people noted that closing PRs may discourage >> contributors. I think our open PR count alone is very discouraging. Und

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
Corrected typo in subject. I want to note that the hbase-spark module in HBase is incomplete. Zhan has several patches pending review. hbase-spark module is currently only in master branch which would be released as 2.0 However the release date for 2.0 is unclear - probably half a year from now.

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin wrote: > On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu wrote: > > I want to note that the hbase-spark module in HBase is incomplete. Zhan > has > > several patches pending review. > > I wouldn't call it "incomplete

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. create a separate tarball for them Probably another thread can be started for the above. I am fine with it. On Tue, Apr 19, 2016 at 10:34 AM, Marcelo Vanzin wrote: > On Tue, Apr 19, 2016 at 10:28 AM, Reynold Xin wrote: > > Yea in general I feel examples that bring in a large amount of > de

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
se's input > formats), which makes it not very useful as a blueprint for developing > HBase apps with Spark. > > On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu wrote: > > bq. I wouldn't call it "incomplete". > > > > I would call it incomplete. > > &g

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
'bq.' is used in JIRA to quote what other people have said. On Tue, Apr 19, 2016 at 10:42 AM, Reynold Xin wrote: > Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian) > syntax? They are not being rendered in email. > > > On Tue,

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
gt; > While you're at it, here's some much better documentation, from the > HBase project themselves, than what the Spark example provides: > http://hbase.apache.org/book.html#spark > > On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu wrote: > > bq. it's actually in use

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
too > many dependencies for something that is not really useful, is why I'm > suggesting removing it. > > > On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu wrote: > > There is an Open JIRA for fixing the documentation: HBASE-15473 > > > > I would say the refguide li

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu wrote: > >> bq. HBase's current support, even if there are bugs or things that still >> need to be done, is much better than the Spark example >> >> In my opinion, a simple example that works is better than a buggy

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
: > On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu wrote: > >> The same question can be asked w.r.t. examples for other projects, such >> as flume and kafka. >> > > The main difference being that flume and kafka integration are part of > Spark itself. HBase integratio

Re: Improving system design logging in spark

2016-04-20 Thread Ted Yu
Interesting. For #3: bq. reading data from, I guess you meant reading from disk. On Wed, Apr 20, 2016 at 10:45 AM, atootoonchian wrote: > Current spark logging mechanism can be improved by adding the following > parameters. It will help in understanding system bottlenecks and provide > useful

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread Ted Yu
Interesting analysis. Can you log a JIRA ? > On Apr 21, 2016, at 11:07 AM, atootoonchian wrote: > > SQL query planner can have intelligence to push down filter commands towards > the storage layer. If we optimize the query planner such that the IO to the > storage is reduced at the cost of run

Re: RFC: Remote "HBaseTest" from examples?

2016-04-21 Thread Ted Yu
Zhan: I have mentioned the JIRA numbers in the thread starting with (note the typo in subject of this thread): RFC: Remove ... On Thu, Apr 21, 2016 at 1:28 PM, Zhan Zhang wrote: > FYI: There are several pending patches for DataFrame support on top of > HBase. > > Thanks. > > Zhan Zhang > > On A

Re: Cache Shuffle Based Operation Before Sort

2016-04-25 Thread Ted Yu
Interesting. bq. details of execution for 10 and 100 scale factor input Looks like some chart (or image) didn't go through. FYI On Mon, Apr 25, 2016 at 12:50 PM, Ali Tootoonchian wrote: > Caching shuffle RDD before the sort process improves system performance. > SQL > planner can be intellige

Re: Number of partitions for binaryFiles

2016-04-26 Thread Ted Yu
Here is the body of StreamFileInputFormat#setMinPartitions : def setMinPartitions(context: JobContext, minPartitions: Int) { val totalLen = listStatus(context).asScala.filterNot(_.isDirectory).map(_.getLen).sum val maxSplitSize = math.ceil(totalLen / math.max(minPartitions, 1.0)).toLong

Re: Number of partitions for binaryFiles

2016-04-26 Thread Ted Yu
> Hi Ted, > > > > I have 36 files of size ~600KB and the rest 74 are about 400KB. > > > > Is there a workaround rather than changing Sparks code? > > > > Best regards, Alexander > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* Tu

Re: spark 2 segfault

2016-05-02 Thread Ted Yu
On May 2, 2016 12:09 AM, "Ted Yu" wrote: >> Using commit hash 90787de864b58a1079c23e6581381ca8ffe7685f and Java 1.7.0_67 >> , I got: >> >> scala> val dfComplicated = sc.parallelize(List((Map("1" -> "a"), List

Re: spark 2 segfault

2016-05-02 Thread Ted Yu
gt; Created issue: >> https://issues.apache.org/jira/browse/SPARK-15062 >> >> On Mon, May 2, 2016 at 6:48 AM, Ted Yu wrote: >> >>> I tried the same statement using Spark 1.6.1 >>> There was no error with default memory setting. >>> >>> Suggest logging a

Re: SQLContext and "stable identifier required"

2016-05-03 Thread Ted Yu
Have you tried the following ? scala> import spark.implicits._ import spark.implicits._ scala> spark res0: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@323d1fa2 Cheers On Tue, May 3, 2016 at 9:16 AM, Koert Kuipers wrote: > with the introduction of SparkSession SQLCont

Re: Proposal of closing some PRs and maybe some PRs abandoned by its author

2016-05-06 Thread Ted Yu
PR #10572 was listed twice. In the future, is it possible to include the contributor's handle beside the PR number so that people can easily recognize their own PR ? Thanks On Fri, May 6, 2016 at 8:45 AM, Hyukjin Kwon wrote: > Hi all, > > > This was similar with the proposal of closing PRs bef

Re: Cache Shuffle Based Operation Before Sort

2016-05-08 Thread Ted Yu
I assume there were supposed to be images following this line (which I don't see in the email thread): bq. Let’s look at details of execution for 10 and 100 scale factor input Consider using 3rd party image site. On Sun, May 8, 2016 at 5:17 PM, Ali Tootoonchian wrote: > Thanks for your comment

Re: Structured Streaming with Kafka source/sink

2016-05-11 Thread Ted Yu
Please see this thread: http://search-hadoop.com/m/q3RTt9XAz651PiG/Adhoc+queries+spark+streaming&subj=Re+Adhoc+queries+on+Spark+2+0+with+Structured+Streaming > On May 11, 2016, at 1:47 AM, Ofir Manor wrote: > > Hi, > I'm trying out Structured Streaming from current 2.0 branch. > Does the branch

Re: dataframe udf functioin will be executed twice when filter on new column created by withColumn

2016-05-11 Thread Ted Yu
In master branch, behavior is the same. Suggest opening a JIRA if you haven't done so. On Wed, May 11, 2016 at 6:55 AM, Tony Jin wrote: > Hi guys, > > I have a problem about spark DataFrame. My spark version is 1.6.1. > Basically, i used udf and df.withColumn to create a "new" column, and then

Re: Query parsing error for the join query between different database

2016-05-18 Thread Ted Yu
Which release of Spark / Hive are you using ? Cheers > On May 18, 2016, at 6:12 AM, JaeSung Jun wrote: > > Hi, > > I'm working on custom data source provider, and i'm using fully qualified > table name in FROM clause like following : > > SELECT user. uid, dept.name > FROM userdb.user user, d

Re: Quick question on spark performance

2016-05-20 Thread Ted Yu
Yash: Can you share the JVM parameters you used ? How many partitions are there in your data set ? Thanks On Fri, May 20, 2016 at 5:59 PM, Reynold Xin wrote: > It's probably due to GC. > > On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote: > >> Hi All, >> I am here to get some expert advice

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-22 Thread Ted Yu
The following line was repeated twice: - For Oracle JDK7, mvn -DskipTests install and run `dev/lint-java`. Did you intend to cover JDK 8 ? Cheers On Sun, May 22, 2016 at 1:25 PM, Dongjoon Hyun wrote: > Hi, All. > > I want to propose the followings. > > - Turn on Travis CI for Apache Spark PR

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-22 Thread Ted Yu
- For Oracle JDK8, mvn -DskipTests install and run `dev/lint-java`. > > Thank you, Ted. > > Dongjoon. > > On Sun, May 22, 2016 at 1:29 PM, Ted Yu wrote: > >> The following line was repeated twice: >> >> - For Oracle JDK7, mvn -DskipTests install and run `dev/lint-java`

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Ted Yu
and amend your commit title or messages, see the Travis CI. > Or, you can monitor Travis CI result on status menu bar. > If it shows green icon, you have nothing to do. > >https://docs.travis-ci.com/user/apps/ > > To sum up, I think we don't need to wait f

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

2016-05-23 Thread Ted Yu
Can you tell us the commit hash using which the test was run ? For #2, if you can give full stack trace, that would be nice. Thanks On Mon, May 23, 2016 at 8:58 AM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi > > 1) Using latest spark 2.0 I've managed to run TPCDSQueryBe

Re: Using Travis for JDK7/8 compilation and lint-java.

2016-05-23 Thread Ted Yu
>>> And, don't worry, Ted. >>> >>> Travis launches new VMs for every PR. >>> >>> Apache Spark repository uses the following setting. >>> >>> VM: Google Compute Engine >>> OS: Ubuntu 14.04.3 LTS Server Edition 64bit >>

Re: ClassCastException: SomeCaseClass cannot be cast to org.apache.spark.sql.Row

2016-05-24 Thread Ted Yu
Please log a JIRA. Thanks On Tue, May 24, 2016 at 8:33 AM, Koert Kuipers wrote: > hello, > as we continue to test spark 2.0 SNAPSHOT in-house we ran into the > following trying to port an existing application from spark 1.6.1 to spark > 2.0.0-SNAPSHOT. > > given this code: > > case class Test(a

Re: Welcoming Yanbo Liang as a committer

2016-06-03 Thread Ted Yu
Congratulations, Yanbo. On Fri, Jun 3, 2016 at 7:48 PM, Matei Zaharia wrote: > Hi all, > > The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a > super active contributor in many areas of MLlib. Please join me in > welcoming Yanbo! > > Matei > --

Re: Can't compile 2.0-preview with scala 2.10

2016-06-06 Thread Ted Yu
See the following from https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-sbt-SCALA-2.10/1642/consoleFull : + SBT_FLAGS+=('-Dscala-2.10') + ./dev/change-scala-version.sh 2.10 FYI On Mon, Jun 6, 2016 at 10:35 AM, Franklyn D'souza < franklyn.dso...@shopify.

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Ted Yu
With commit 200f01c8fb15680b5630fbd122d44f9b1d096e02 using Scala 2.11: Using Python version 2.7.9 (default, Apr 29 2016 10:48:06) SparkSession available as 'spark'. >>> from pyspark.sql import SparkSession >>> from pyspark.sql.types import IntegerType, StructField, StructType >>> from pyspark.sql.

Re: Dataset API agg question

2016-06-07 Thread Ted Yu
Have you tried the following ? Seq(1->2, 1->5, 3->6).toDS("a", "b") then you can refer to columns by name. FYI On Tue, Jun 7, 2016 at 3:58 PM, Alexander Pivovarov wrote: > I'm trying to switch from RDD API to Dataset API > My question is about reduceByKey method > > e.g. in the following exa

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Ted Yu
I built with Scala 2.10 >>> df.select(add_one(df.a).alias('incremented')).collect() The above just hung. On Tue, Jun 7, 2016 at 3:31 PM, franklyn wrote: > Thanks Ted !. > > I'm using > > https://github.com/apache/spark/commit/8f5a04b6299e3a47aca13cbb40e72344c0114860 > and building with scala-2

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Ted Yu
Please go ahead. On Tue, Jun 7, 2016 at 4:45 PM, franklyn wrote: > Thanks for reproducing it Ted, should i make a Jira Issue?. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataframes-in-spark-2-0-preview-scala-2-10-tp1

Re: Kryo registration for Tuples?

2016-06-08 Thread Ted Yu
I think the second group (3 classOf's) should be used. Cheers On Wed, Jun 8, 2016 at 4:53 PM, Alexander Pivovarov wrote: > if my RDD is RDD[(String, (Long, MyClass))] > > Do I need to register > > classOf[MyClass] > classOf[(Any, Any)] > > or > > classOf[MyClass] > classOf[(Long, MyClass)] > cl

Re: [VOTE] Release Apache Spark 1.6.2 (RC1)

2016-06-17 Thread Ted Yu
Docker Integration Tests failed on Linux: http://pastebin.com/Ut51aRV3 Here was the command I used: mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr -Dhadoop.version=2.7.0 package Has anyone seen similar error ? Thanks On Thu, Jun 16, 2016 at 9:49 PM, Reynold Xin wrote: > P

Re: Hello

2016-06-17 Thread Ted Yu
You can use a JIRA filter to find JIRAs of the component(s) you're interested in. Then sort by Priority. Maybe comment on the JIRA if you want to work on it. On Fri, Jun 17, 2016 at 3:22 PM, Pedro Rodriguez wrote: > What is the best way to determine what the library maintainers believe is > imp

Re: does shark0.9.1 work well with hadoop2.2.0 ?

2014-04-20 Thread Ted Yu
bq. I have tried replace protobuf2.4.1 in shark with protobuf2.5.0 Did you replace the jar file or did you change the following in pom.xml and rebuild ? 2.4.1 Cheers On Sun, Apr 20, 2014 at 3:45 AM, qingyang li wrote: > shark 0.9.1 is using protobuf 2.4.1 , but hadoop2.2.0 is using > proto

Re: (test)

2014-05-16 Thread Ted Yu
Yes. On Thu, May 15, 2014 at 10:34 AM, Andrew Or wrote: > Apache has been having some problems lately. Do you guys see this message? >

Re: Compile failure with SBT on master

2014-06-16 Thread Ted Yu
I used the same command on Linux and it passed: Linux k.net 2.6.32-220.23.1.el6.YAHOO.20120713.x86_64 #1 SMP Fri Jul 13 11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux Cheers On Mon, Jun 16, 2014 at 9:29 PM, Andrew Ash wrote: > I can't run sbt/sbt gen-idea on a clean checkout of Spark master

Re: Compile failure with SBT on master

2014-06-17 Thread Ted Yu
:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 On Mon, Jun 16, 2014 at 10:04 PM, Andrew Ash wrote: > Maybe it's a Mac OS X thing? > > > On Mon, Jun 16, 2014 at 9:57 PM, Ted Yu wrote: > > > I used the same command on Linux and it passed: > > > > Linux k.net 2.6.32

Re: (send this email to subscribe)

2014-07-08 Thread Ted Yu
See http://spark.apache.org/news/spark-mailing-lists-moving-to-apache.html Cheers On Jul 8, 2014, at 4:17 AM, Leon Zhang wrote: >

Re: (send this email to subscribe)

2014-07-08 Thread Ted Yu
This is the correct page: http://spark.apache.org/community.html Cheers On Jul 8, 2014, at 4:43 AM, Ted Yu wrote: > See http://spark.apache.org/news/spark-mailing-lists-moving-to-apache.html > > Cheers > > On Jul 8, 2014, at 4:17 AM, Leon Zhang wrote: > >>

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Ted Yu
HADOOP-10456 is fixed in hadoop 2.4.1 Does this mean that synchronization on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK can be bypassed for hadoop 2.4.1 ? Cheers On Fri, Jul 25, 2014 at 6:00 PM, Patrick Wendell wrote: > The most important issue in this release is actually an ammendment to > a

setting inputMetrics in HadoopRDD#compute()

2014-07-26 Thread Ted Yu
Hi, Starting at line 203: try { /* bytesRead may not exactly equal the bytes read by a task: split boundaries aren't * always at record boundaries, so tasks may need to read into other splits to complete * a record. */ inputMetrics.bytesRead = split.inputSpli

Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Ted Yu
s good to be conservative in what we expect of the > Hadoop client libraries. > > If you'd like to discuss this further, please fork a new thread, since > this is a vote thread. Thanks! > > On Fri, Jul 25, 2014 at 10:14 PM, Ted Yu wrote: > > HADOOP-10456 is fixed i

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Stev

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
perhaps sort out its dependencies manually in your build. > > On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu wrote: > > I found 0.13.1 artifacts in maven: > > > http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar > > > > Howev

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
now. > I don't think hive-metastore is related to this question? > > I am no expert on the Hive artifacts, just remembering what the issue > was initially in case it helps you get to a similar solution. > > On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu wrote: > > hive-exec (as o

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
sue. If not, we'll > have to continue forking our own version of Hive to change the way it > publishes artifacts. > > - Patrick > > On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu wrote: > > Talked with Owen offline. He confirmed that as of 0.13, hive-exec is > still > >

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
t; pull in hive-exec-core.jar > >> >> > >> >> Cheers > >> >> > >> >> > >> >> On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell > > >> >> wrote: > >> >> > >> >> > It woul

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
3:41 PM, Ted Yu wrote: > After manually copying hive 0.13.1 jars to local maven repo, I got the > following errors when building spark-hive_2.10 module : > > [ERROR] > /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: > type mismat

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
> > One question I have is, What is the goal of upgrading to hive 0.13.0? Is > it purely because you are having problems connecting to newer metastores? > Are there some features you are hoping for? This will help me prioritize > this effort. > > Michael > > > On Mo

Re: Working Formula for Hive 0.13?

2014-07-30 Thread Ted Yu
I found SPARK-2706 Let me attach tentative patch there - I still face compilation error. Cheers On Mon, Jul 28, 2014 at 5:59 PM, Ted Yu wrote: > bq. Either way its unclear to if there is any reason to use reflection to > support multiple versions, instead of just upgrading to Hive

Re: subscribe dev list for spark

2014-07-30 Thread Ted Yu
See Mailing list section of: https://spark.apache.org/community.html On Wed, Jul 30, 2014 at 6:53 PM, Grace wrote: > >

Re: failed to build spark with maven for both 1.0.1 and latest master branch

2014-07-31 Thread Ted Yu
The following command succeeded (on Linux) on Spark master checked out this morning: mvn -Pyarn -Phive -Phadoop-2.4 -DskipTests install FYI On Thu, Jul 31, 2014 at 1:36 PM, yao wrote: > Hi TD, > > I've asked my colleagues to do the same thing but compile still fails. > However, maven build su

compilation error in Catalyst module

2014-08-06 Thread Ted Yu
I refreshed my workspace. I got the following error with this command: mvn -Pyarn -Phive -Phadoop-2.4 -DskipTests install [ERROR] bad symbolic reference. A signature in package.class refers to term scalalogging in package com.typesafe which is not available. It may be completely missing from the

Re: compilation error in Catalyst module

2014-08-06 Thread Ted Yu
Forgot to do that step. Now compilation passes. On Wed, Aug 6, 2014 at 1:36 PM, Zongheng Yang wrote: > Hi Ted, > > By refreshing do you mean you have done 'mvn clean'? > > On Wed, Aug 6, 2014 at 1:17 PM, Ted Yu wrote: > > I refreshed my workspace. > >

Re: Unit tests in < 5 minutes

2014-08-08 Thread Ted Yu
How about using parallel execution feature of maven-surefire-plugin (assuming all the tests were made parallel friendly) ? http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html Cheers On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen wrote: > A commo

reference to dstream in package org.apache.spark.streaming which is not available

2014-08-22 Thread Ted Yu
Hi, Using the following command on (refreshed) master branch: mvn clean package -DskipTests I got: constituent[36]: file:/homes/hortonzy/apache-maven-3.1.1/conf/logging/ --- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAcce

Re: Dependency hell in Spark applications

2014-09-05 Thread Ted Yu
>From output of dependency:tree: [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-streaming_2.10 --- [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-client:ja

BasicOperationsSuite failing ?

2014-09-29 Thread Ted Yu
Hi, Running test suite in trunk, I got: ^[[32mBasicOperationsSuite:^[[0m ^[[32m- map^[[0m ^[[32m- flatMap^[[0m ^[[32m- filter^[[0m ^[[32m- glom^[[0m ^[[32m- mapPartitions^[[0m ^[[32m- repartition (more partitions)^[[0m ^[[32m- repartition (fewer partitions)^[[0m ^[[32m- groupByKey^[[0m ^[[32m- red

Re: Extending Scala style checks

2014-10-01 Thread Ted Yu
Please take a look at WhitespaceEndOfLineChecker under: http://www.scalastyle.org/rules-0.1.0.html Cheers On Wed, Oct 1, 2014 at 2:01 PM, Nicholas Chammas wrote: > As discussed here , it would be > good to extend our Scala style checks to programmatica

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Ted Yu
I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu wrote: > Hi, > > I just submitted a patch https://github.com/apache/spark/pull/2864/files > with one line change > > but the Jenkins told me it's failed to compile on the unr

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Koert: Have you tried adding the following on your commandline ? -Dscalastyle.failOnViolation=false Cheers On Thu, Oct 23, 2014 at 11:07 AM, Patrick Wendell wrote: > Hey Koert, > > I think disabling the style checks in maven package could be a good > idea for the reason you point out. I was so

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
le execution: You have 3 Scalastyle > violation(s). -> [Help 1] > > > On Thu, Oct 23, 2014 at 2:14 PM, Ted Yu wrote: > >> Koert: >> Have you tried adding the following on your commandline ? >> >> -Dscalastyle.failOnViolation=false >> >> Cheers

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Created SPARK-4066 and attached patch there. On Thu, Oct 23, 2014 at 1:07 PM, Koert Kuipers wrote: > great thanks i will do that > > On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu wrote: > >> Koert: >> If you have time, you can try this diff - with which you would be able to

<    1   2   3   4