Re: test failed due to OOME

2015-11-02 Thread Ted Yu
Looks like SparkListenerSuite doesn't OOM on QA runs compared to Jenkins builds. I wonder if this is due to difference between machines running QA tests vs machines running Jenkins builds. On Fri, Oct 30, 2015 at 1:19 PM, Ted Yu wrote: > I noticed that the SparkContext created in each

Re: test failed due to OOME

2015-11-02 Thread Ted Yu
bly need to log into Jenkins and > heap dump some running tests and figure out what is going on. > > On Mon, Nov 2, 2015 at 7:42 AM, Ted Yu wrote: > >> Looks like SparkListenerSuite doesn't OOM on QA runs compared to Jenkins >> builds. >> >> I wonder if t

Re: Running individual test classes

2015-11-03 Thread Ted Yu
My experience is that going through tests in each module takes some time before reaching the test specified by the wildcard. Some test, such as SparkLauncherSuite, would run even if not in wildcard. FYI > On Nov 3, 2015, at 1:24 AM, Nitin Goyal wrote: > > In maven, you might want to try fo

Re: SparkLauncher#setJavaHome does not set JAVA_HOME in child process

2015-11-03 Thread Ted Yu
Opening JIRA is fine. Thanks On Tue, Nov 3, 2015 at 4:25 AM, gus wrote: > Thanks, Ted. > The SparkLauncher test suite runs fine for me, with or without the change. > Do you agree this is a bug? If so, should I open a JIRA? > > > > -- > View this message in context: > http://apache-spark-develop

Re: Master build fails ?

2015-11-03 Thread Ted Yu
Interesting, Sbt builds were not all failing: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/ FYI On Tue, Nov 3, 2015 at 5:58 AM, Jean-Baptiste Onofré wrote: > Hi Jacek, > > it works fine with mvn: the problem is with sbt. > > I suspect a different reactor order in sbt compare to

Re: Build a specific module only

2015-11-04 Thread Ted Yu
Please take a look at https://issues.apache.org/jira/browse/SPARK-10883 > On Nov 4, 2015, at 3:27 AM, gsvic wrote: > > Is it possible to build a specific spark module without building the whole > project? > > For example, I am trying to build sql-core project by > > /build/mvn -pl sql/core ins

Re: Master build fails ?

2015-11-05 Thread Ted Yu
t; > Is there a solution to this ? > > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From: Jean-Baptiste Onofré > To:Ted Yu > Cc:"dev@spark.apache.org" > Date:

Re: Master build fails ?

2015-11-05 Thread Ted Yu
lip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Ted Yu > To:Dilip Biswal/Oakland/IBM@IBMUS > Cc:Jean-Baptiste Onofré , "dev@spark.apache.org" > > Date:11/05/2015 10:46 AM > Subject:Re: Master build fail

Re: State of the Build

2015-11-05 Thread Ted Yu
See previous discussion: http://search-hadoop.com/m/q3RTtPnPnzwOhBr FYI On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch wrote: > Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build > environment by updating the pom.xml in each of the subprojects. If you were > able to come

Re: Master build fails ?

2015-11-06 Thread Ted Yu
Since maven is the preferred build vehicle, ivy style dependencies policy would produce surprising results compared to today's behavior. I would suggest staying with current dependencies policy. My two cents. On Fri, Nov 6, 2015 at 6:25 AM, Koert Kuipers wrote: > if there is no strong preferen

Re: State of the Build

2015-11-06 Thread Ted Yu
bq. include an sbt jar in the source repo Can you clarify which sbt jar (by path) ? I tried 'git log' on the following files but didn't see commit history: ./build/sbt-launch-0.13.7.jar ./build/zinc-0.3.5.3/lib/sbt-interface.jar ./sbt/sbt-launch-0.13.2.jar ./sbt/sbt-launch-0.13.5.jar On Fri, No

Re: Build fails due to...multiple overloaded alternatives of constructor RDDInfo define default arguments?

2015-11-07 Thread Ted Yu
Created a PR for the compilation error: https://github.com/apache/spark/pull/9538 Cheers On Sat, Nov 7, 2015 at 4:41 AM, Jacek Laskowski wrote: > Hi, > > Checked out the latest sources and the build failed: > > [error] > /Users/jacek/dev/oss/spark/core/src/main/scala/org/apache/spark/storage/RD

Re: Calling stop on StreamingContext locks up

2015-11-07 Thread Ted Yu
Would the following change work for you ? diff --git a/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala b/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala index 61b5a4c..c330d25 100644 --- a/core/src/main/scala/org/apache/spark/util/AsynchronousListene

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
+1 On Sat, Nov 7, 2015 at 4:35 PM, Denny Lee wrote: > +1 > > > On Sat, Nov 7, 2015 at 12:01 PM Mark Hamstra > wrote: > >> +1 >> >> On Tue, Nov 3, 2015 at 3:22 PM, Reynold Xin wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 1.5.2. The vote is open unti

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
Why did you directly jump to spark-streaming-mqtt module ? Can you drop 'spark-streaming-mqtt' and try again ? Not sure why 1.5.0-SNAPSHOT showed up. Were you using RC2 source ? Cheers On Sun, Nov 8, 2015 at 7:28 PM, 欧锐 <494165...@qq.com> wrote: > > build spark-streaming-mqtt_2.10 failed! > >

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Ted Yu
Please consider using NoSQL engine such as hbase. Cheers > On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic > transformation layer,

Re: Seems jenkins is down (or very slow)?

2015-11-12 Thread Ted Yu
I was able to access the following where response was fast: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45806/ Cheers On Thu, Nov 12, 2015 at 6:21 PM, Yin Huai wrote: > Hi Guys, > > Seems Jenkins is

SparkPullRequestBuilder coverage

2015-11-13 Thread Ted Yu
Hi, I noticed that SparkPullRequestBuilder completes much faster than maven Jenkins build. From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45871/consoleFull , I couldn't get exact time the builder started but looks like the duration was around 20 minutes. From https://ampl

Re: SparkPullRequestBuilder coverage

2015-11-13 Thread Ted Yu
are impacted by the change. E.g. if you only > modify SQL, it won't run the core or streaming tests. > > > On Fri, Nov 13, 2015 at 11:17 AM, Ted Yu wrote: > >> Hi, >> I noticed that SparkPullRequestBuilder completes much faster than maven >> Jenkins build. >

Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map df to rdd of my cas

Re: releasing Spark 1.4.2

2015-11-16 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtLKc2ctNPcq&subj=Re+Spark+1+4+2+release+and+votes+conversation+ > On Nov 15, 2015, at 10:53 PM, Niranda Perera wrote: > > Hi, > > I am wondering when spark 1.4.2 will be released? > > is it in the voting stage at the moment? > > rgds > > --

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread Ted Yu
Should a new job be setup under Spark-Master-Maven-with-YARN for hadoop 2.6.x ? Cheers On Thu, Nov 19, 2015 at 5:16 PM, 张志强(旺轩) wrote: > I agreed > +1 > > -- > 发件人:Reynold Xin > 日 期:2015年11月20日 06:14:44 > 收件人:dev@spark.apache.org;

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-24 Thread Ted Yu
If I am not mistaken, the binaries for Scala 2.11 were generated against hadoop 1. What about binaries for Scala 2.11 against hadoop 2.x ? Cheers On Sun, Nov 22, 2015 at 2:21 PM, Michael Armbrust wrote: > In order to facilitate community testing of Spark 1.6.0, I'm excited to > announce the av

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
I tried to run test suite and encountered the following: http://pastebin.com/DPnwMGrm FYI On Wed, Dec 2, 2015 at 12:39 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > -0 > > If spark-ec2 is still a supported part of the project, then we should > update its version lists as new relea

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
+1 Ran through test suite (minus docker-integration-tests) which passed. Overall experience was much better compared with some of the prior RC's. [INFO] Spark Project External Kafka ... SUCCESS [ 53.956 s] [INFO] Spark Project Examples . SUCCESS [0

Maven build against Hadoop 2.4 times out

2015-12-11 Thread Ted Yu
Hi, You may have noticed that maven build against Hadoop 2.4 times out on Jenkins. The last module is spark-hive-thriftserver This seemed to start with build #4440 FYI - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

Re: Maven build against Hadoop 2.4 times out

2015-12-13 Thread Ted Yu
ailed > way before the thrift server tests. > > On Fri, Dec 11, 2015 at 10:27 AM, Ted Yu wrote: > >> Hi, >> You may have noticed that maven build against Hadoop 2.4 times out on >> Jenkins. >> >> The last module is spark-hive-thrift

Re: Maven build against Hadoop 2.4 times out

2015-12-14 Thread Ted Yu
> I am wondering if there is any environment related issue. > > On Sun, Dec 13, 2015 at 3:38 PM, Ted Yu wrote: > >> Thanks for checking, Yin. >> >> Looks like the cause might be in one of the commits for build #4438 >> >> Cheers >> >>

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
Please take a look at: https://issues.apache.org/jira/browse/SPARK-7173 Cheers > On Dec 15, 2015, at 1:23 AM, 张志强(旺轩) wrote: > > Hi all, > > Has anyone tried label based scheduling via spark on yarn? I’ve tried that, > it didn’t work, spark 1.4.1 + apache hadoop 2.6.0 > > Any feedbacks are

Re: status of 2.11 support?

2015-12-15 Thread Ted Yu
Please see related JIRA: https://issues.apache.org/jira/browse/SPARK-8013 This question is better suited for user mailing list. Thanks On Mon, Dec 14, 2015 at 10:29 PM, Sachin Aggarwal < different.sac...@gmail.com> wrote: > Hi, > > > adding question from user group to dev group need expert advi

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
I was blocked to get the YARN containers by setting > spark.yarn.executor.nodeLabelExpression property. My question, > https://issues.apache.org/jira/browse/SPARK-7173 will fix this? > > > > Thanks > > Allen > > > > > > *发件人:* Ted Yu [mailto:yuzhih...@gmail.co

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
ny labels. > > > > It’s weird to me that YARN page shows my application is running, but > actually it is still waiting for its executor > > > > See the attached. > > > > Thanks, > > Allen > > > > *发件人:* Saisai Shao [mailto:sai.sai.s...@gmail.

Re: does spark really support label expr like && or || ?

2015-12-16 Thread Ted Yu
Allen: Since you mentioned scheduling, I assume you were talking about node label support in YARN. If that is the case, can you give us some more information: How node labels are setup in YARN cluster How you specified node labels in application Hadoop and Spark releases you are using Cheers >

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Ted Yu
Ran test suite (minus docker-integration-tests) All passed +1 [INFO] Spark Project External ZeroMQ .. SUCCESS [ 13.647 s] [INFO] Spark Project External Kafka ... SUCCESS [ 45.424 s] [INFO] Spark Project Examples . SUCCESS [02:06

Re: does spark really support label expr like && or || ?

2015-12-17 Thread Ted Yu
.jar > > so , my question is does the spark.yarn.executor.nodeLabelExpression > and spark.yarn.am.nodeLabelExpression really support "EXPRESSION" like and > &&, or ||, or even ! and so on. > > NOTE: > I didn't change the capacity-scheduler.xml at all,

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Ted Yu
In Jerry's example, the first SparkContext, sc, has been stopped. So there would be only one SparkContext running at any given moment. Cheers On Mon, Dec 21, 2015 at 8:23 AM, Chester @work wrote: > Jerry > I thought you should not create more than one SparkContext within one > Jvm, ... > C

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Ted Yu
Running test suite, there was timeout in hive-thriftserver module. This has been fixed by SPARK-11823. So I assume this is test issue. lgtm On Tue, Dec 22, 2015 at 2:28 PM, Benjamin Fradet wrote: > +1 > On 22 Dec 2015 9:54 p.m., "Andrew Or" wrote: > >> +1 >> >> 2015-12-22 12:43 GMT-08:00 Reyn

Re: [DAGScheduler] resubmitFailedStages, failedStages.clear() and submitStage

2015-12-24 Thread Ted Yu
getMissingParentStages(stage) would be called for the stage (being re-submitted) If there is no missing parents, submitMissingTasks() would be called. If there is missing parent(s), the parent would go through the same flow. I don't see issue in this part of the code. Cheers On Thu, Dec 24, 201

recurring test failures against hadoop-2.4 profile

2015-12-25 Thread Ted Yu
Hi, You may have noticed the following test failures: org.apache.spark.sql.hive.execution.HiveUDFSuite.UDFIntegerToString org.apache.spark.sql.hive.execution.SQLQuerySuite.udf_java_method Tracing backwards, they started failing since this build: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Ma

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Ted Yu
I found that SBT build for Scala 2.11 has been failing ( https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull ) I logged SPARK-12527 and sent a PR. FYI On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust wrote: > Please vote

Re: Akka with Spark

2015-12-26 Thread Ted Yu
Do you mind sharing your use case ? It may be possible to use a different approach than Akka. Cheers On Sat, Dec 26, 2015 at 10:08 AM, Disha Shrivastava wrote: > Hi, > > I wanted to know how to use Akka framework with Spark starting from > basics. I saw online that Spark uses Akka framework bu

Re: Akka with Spark

2015-12-27 Thread Ted Yu
scale them independently. So consider streaming data >>>> from Akka to Spark Streaming or go the other way, from Spark to Akka >>>> Streams. >>>> >>>> dean >>>> >>>> Dean Wampler, Ph.D. >>>> Author: Programming Scala,

Re: what is the best way to debug spark / mllib?

2015-12-27 Thread Ted Yu
For #1, 9 minutes seem to be normal. Here was duration for recent build on master branch: [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 10:44 mi

Re: Is there any way to stop a jenkins build

2015-12-29 Thread Ted Yu
HiveThriftBinaryServerSuite got stuck. I thought Josh has fixed this issue: [SPARK-11823][SQL] Fix flaky JDBC cancellation test in HiveThriftBinaryServerSuite On Tue, Dec 29, 2015 at 9:56 AM, Herman van Hövell tot Westerflier < hvanhov...@questtec.nl> wrote: > My AMPLAB jenkins build has been s

Re: Is there any way to stop a jenkins build

2015-12-29 Thread Ted Yu
, 2015 at 10:04 AM, Herman van Hövell tot Westerflier < > hvanhov...@questtec.nl> wrote: > >> Thanks. I'll merge the most recent master... >> >> Still curious if we can stop a build. >> >> Kind regards, >> >> Herman van Hövell tot Westerflier &

IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
Hi, I noticed that there are a lot of checkstyle warnings in the following form: To my knowledge, we use two spaces for each tab. Not sure why all of a sudden we have so many IndentationCheck warnings: grep 'hild have incorrect indentati' trunkCheckstyle.xml | wc 3133 52645 678294 If th

Re: IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
Oops, wrong list :-) > On Dec 29, 2015, at 9:48 PM, Reynold Xin wrote: > > +Herman > > Is this coming from the newly merged Hive parser? > > > >> On Tue, Dec 29, 2015 at 9:46 PM, Allen Zhang wrote: >> >> >> format issue I think, go ahead &g

Re: IndentationCheck of checkstyle

2015-12-30 Thread Ted Yu
Right. Pardon my carelessness. > On Dec 29, 2015, at 9:58 PM, Reynold Xin wrote: > > OK to close the loop - this thread has nothing to do with Spark? > > >> On Tue, Dec 29, 2015 at 9:55 PM, Ted Yu wrote: >> Oops, wrong list :-) >> >>> On De

Re: Spark streaming 1.6.0-RC4 NullPointerException using mapWithState

2015-12-30 Thread Ted Yu
I went through StateMap.scala a few times but didn't find any logic error yet. According to the call stack, the following was executed in get(key): } else { parentStateMap.get(key) } This implies that parentStateMap was null. But it seems parentStateMap is properly assigned in readO

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Ted Yu
+1 > On Jan 5, 2016, at 10:49 AM, Davies Liu wrote: > > +1 > > On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas > wrote: >> +1 >> >> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python >> 2.6 is ancient history and the core Python developers stopped supporting it >> in

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Ted Yu
I logged SPARK-12778 where endian awareness in Platform.java should help in mixed endian set up. There could be other parts of the code base which are related. Cheers On Tue, Jan 12, 2016 at 7:01 AM, Adam Roberts wrote: > Hi all, I've been experimenting with DataFrame operations in a mixed > e

Re: Dependency on TestingUtils in a Spark package

2016-01-12 Thread Ted Yu
There is no annotation in TestingUtils class indicating whether it is suitable for consumption by external projects. You should assume the class is not public since its methods may change in future Spark releases. Cheers On Tue, Jan 12, 2016 at 12:36 PM, Robert Dodier wrote: > Hi, > > I'm putt

Re: Spark 1.6.0 and HDP 2.2 - problem

2016-01-13 Thread Ted Yu
I would suggest trying option #1 first. Thanks > On Jan 13, 2016, at 2:12 AM, Maciej Bryński wrote: > > Hi, > I/m trying to run Spark 1.6.0 on HDP 2.2 > Everything was fine until I tried to turn on dynamic allocation. > According to instruction I need to add shuffle service to yarn classpath.

Re: timeout in shuffle problem

2016-01-24 Thread Ted Yu
Cycling past bits: http://search-hadoop.com/m/q3RTtU5CRU1KKVA42&subj=RE+shuffle+FetchFailedException+in+spark+on+YARN+job On Sun, Jan 24, 2016 at 5:52 AM, wangzhenhua (G) wrote: > Hi, > > I have a problem of time out in shuffle, it happened after shuffle write > and at the start of shuffle read,

Re: BUILD FAILURE at spark-sql_2.11?!

2016-01-27 Thread Ted Yu
Strangely both Jenkins jobs showed green status: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-sbt-SCALA-2.11/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-MAVEN-SCALA-2.11/ On Wed, Jan 27, 2016 at 12:47 AM,

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
For the last two problems, hbase-site.xml seems not to be on classpath. Once hbase-site.xml is put on classpath, you should be able to make progress. Cheers > On Jan 28, 2016, at 1:14 AM, Maciej Bryński wrote: > > Hi, > I'm trying to run SQL query on Hive table which is stored on HBase. > I'

Re: 回复: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
alsIgnoreCase("string"); > > String tsColName = null; > if (iTimestamp >= 0) { > tsColName = > jobConf.get(serdeConstants.LIST_COLUMNS).split(",")[iTimestamp]; > } > > > > -- 原始邮件 -- > *发件人:* "Jörn Franke";;

Re: build error: code too big: specialStateTransition(int, IntStream)

2016-01-28 Thread Ted Yu
After this change: [SPARK-12681] [SQL] split IdentifiersParser.g into two files the biggest file under sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser is SparkSqlParser.g Maybe split SparkSqlParser.g up as well ? On Thu, Jan 28, 2016 at 5:21 AM, Iulian Dragoș wrote: > Hi,

Re: Scala 2.11 default build

2016-01-30 Thread Ted Yu
Does this mean the following Jenkins builds can be disabled ? https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-MAVEN-SCALA-2.11/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-sbt-SCALA-2.11/ Cheers On Sat, Jan 3

Re: Spark not able to fetch events from Amazon Kinesis

2016-01-30 Thread Ted Yu
w.r.t. protobuf-java version mismatch, I wonder if you can rebuild Spark with the following change (using maven): http://pastebin.com/fVQAYWHM Cheers On Sat, Jan 30, 2016 at 12:49 AM, Yash Sharma wrote: > Hi All, > I have a quick question if anyone has experienced this here. > > I have been tr

Re: Scala 2.11 default build

2016-02-01 Thread Ted Yu
The following jobs have been established for build against Scala 2.10: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-MAVEN-SCALA-2.10/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-sbt-SCALA-2.10/ FYI On Mon,

Re: Spark 1.6.1

2016-02-01 Thread Ted Yu
SPARK-12624 has been resolved. According to Wenchen, SPARK-12783 is fixed in 1.6.0 release. Are there other blockers for Spark 1.6.1 ? Thanks On Wed, Jan 13, 2016 at 5:39 PM, Michael Armbrust wrote: > Hey All, > > While I'm not aware of any critical issues with 1.6.0, there are several > corne

Re: Secure multi tenancy on in stand alone mode

2016-02-01 Thread Ted Yu
w.r.t. running Spark on YARN, there are a few outstanding issues. e.g. SPARK-11182 HDFS Delegation Token See also the comments under SPARK-12279 FYI On Mon, Feb 1, 2016 at 1:02 PM, eugene miretsky wrote: > When having multiple users sharing the same Spark cluster, it's a good > idea to isolat

Re: Encrypting jobs submitted by the client

2016-02-02 Thread Ted Yu
For #1, a brief search landed the following: core/src/main/scala/org/apache/spark/SparkConf.scala: DeprecatedConfig("spark.rpc", "2.0", "Not used any more.") core/src/main/scala/org/apache/spark/SparkConf.scala: "spark.rpc.numRetries" -> Seq( core/src/main/scala/org/apache/spark/SparkConf.scala:

Re: Building Spark with Custom Hadoop Version

2016-02-04 Thread Ted Yu
Assuming your change is based on hadoop-2 branch, you can use 'mvn install' command which would put artifacts under 2.8.0-SNAPSHOT subdir in your local maven repo. Here is an example: ~/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.8.0-SNAPSHOT Then you can use the following command to build Spa

Re: Welcoming two new committers

2016-02-08 Thread Ted Yu
Congratulations, Herman and Wenchen. On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia wrote: > Hi all, > > The PMC has recently added two new Spark committers -- Herman van Hovell > and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten, > adding new features, optimizations and

Re: Error aliasing an array column.

2016-02-09 Thread Ted Yu
Do you mind pastebin'ning code snippet and exception one more time - I couldn't see them in your original email. Which Spark release are you using ? On Tue, Feb 9, 2016 at 11:55 AM, rakeshchalasani wrote: > Hi All: > > I am getting an "UnsupportedOperationException" when trying to alias an > ar

Re: Error aliasing an array column.

2016-02-09 Thread Ted Yu
sorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.l

Re: Error aliasing an array column.

2016-02-09 Thread Ted Yu
gt; |arrayCol| > ++ > | [0, 1]| > | [1, 2]| > | [2, 3]| > | [3, 4]| > | [4, 5]| > | [5, 6]| > | [6, 7]| > | [7, 8]| > | [8, 9]| > | [9, 10]| > ++ > > > > On Tue, Feb 9, 2016 at 4:52 PM Ted Yu wrote: > >> How about chang

Re: Building Spark with a Custom Version of Hadoop: HDFS ClassNotFoundException

2016-02-11 Thread Ted Yu
Hdfs class is in hadoop-hdfs-XX.jar Can you check the classpath to see if the above jar is there ? Please describe the command lines you used for building hadoop / Spark. Cheers On Thu, Feb 11, 2016 at 5:15 PM, Charlie Wright wrote: > I am having issues trying to run a test job on a built ver

Re: Call wholeTextFiles to read gzip files

2016-02-16 Thread Ted Yu
Have you seen this thread ? http://stackoverflow.com/questions/24402737/how-to-read-gz-files-in-spark-using-wholetextfiles On Tue, Feb 16, 2016 at 2:17 AM, Deepak Gopalakrishnan wrote: > Hello, > > I'm reading S3 files using wholeTextFiles() . My files are gzip format but > the names of the fil

Re: 回复: a new FileFormat 5x~100x faster than parquet

2016-02-22 Thread Ted Yu
The referenced benchmark is in Chinese. Please provide English version so that more people can understand. For item 7, looks like the speed of ingest is much slower compared to using Parquet. Cheers On Mon, Feb 22, 2016 at 6:12 AM, 开心延年 wrote: > 1.ya100 is not only the invert index ,but also

Re: Opening a JIRA for QuantileDiscretizer bug

2016-02-22 Thread Ted Yu
When you click on Create, you're brought to 'Create Issue' dialog where you choose Project Spark. Component should be MLlib. Please see also: http://search-hadoop.com/m/q3RTtmsshe1W6cH22/spark+pull+template&subj=pull+request+template On Mon, Feb 22, 2016 at 6:45 PM, Pierson, Oliver C wrote: >

Re: Hbase in spark

2016-02-26 Thread Ted Yu
In hbase, there is hbase-spark module which supports bulk load. This module is to be backported in the upcoming 1.3.0 release. There is some pending work, such as HBASE-15271 . FYI On Fri, Feb 26, 2016 at 8:50 AM, Renu Yadav wrote: > Has anybody implemented bulk load into hbase using spark? >

Re: Spark log4j fully qualified class name

2016-02-27 Thread Ted Yu
Looking at https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html *WARNING* Generating the caller class information is slow. Thus, use should be avoided unless execution speed is not an issue. On Sat, Feb 27, 2016 at 12:40 PM, Prabhu Joseph wrote: > Hi All, > > Whe

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
Since majority of code is written in Scala which is not analyzed by Coverity, the efficacy of the tool seems limited. > On Mar 4, 2016, at 2:34 AM, Sean Owen wrote: > > https://scan.coverity.com/projects/apache-spark-2f9d080d-401d-47bc-9dd1-7956c411fbb4?tab=overview > > This has to be run man

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
suggesting anyone run it regularly, > but one run to catch some bugs is useful. > > I've already triaged ~70 issues there just in the Java code, of which > a handful are important. > > On Fri, Mar 4, 2016 at 12:18 PM, Ted Yu wrote: > > Since majority of code is written

Re: Set up a Coverity scan for Spark

2016-03-04 Thread Ted Yu
ashCode > > On Fri, Mar 4, 2016 at 2:52 PM, Ted Yu wrote: > > Last time I checked there wasn't high impact defects. > > > > Mind pointing out the defects you think should be fixed ? > > > > Thanks > > > > On Fri, Mar 4, 2016 at 4:35 AM, Sean Owen w

Re: Spark SQL drops the HIVE table in "overwrite" mode while writing into table

2016-03-05 Thread Ted Yu
Please stack trace, code snippet, etc in the JIRA you created so that people can reproduce what you saw. On Sat, Mar 5, 2016 at 7:02 AM, Dhaval Modi wrote: > > Regards, > Dhaval Modi > dhavalmod...@gmail.com > > -- Forwarded message -- > From: Dhaval Modi > Date: 5 March 2016 at

Re: Does anyone implement org.apache.spark.serializer.Serializer in their own code?

2016-03-07 Thread Ted Yu
Josh: SerializerInstance and SerializationStream would also become private[spark], right ? Thanks On Mon, Mar 7, 2016 at 6:57 PM, Josh Rosen wrote: > Does anyone implement Spark's serializer interface > (org.apache.spark.serializer.Serializer) in your own third-party code? If > so, please let m

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-19 Thread Ted Yu
com> wrote: > Looks like the other packages may also be corrupt. I’m getting the same > error for the Spark 1.6.1 / Hadoop 2.4 package. > > > https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz > > Nick > ​ > > On Wed, Mar 16, 2016 at 8:28 PM

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-19 Thread Ted Yu
On Linux, I got: $ tar zxf spark-1.6.1-bin-hadoop2.6.tgz gzip: stdin: unexpected end of file tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now On Wed, Mar 16, 2016 at 5:15 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > https:

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-19 Thread Ted Yu
:48 PM, "Nicholas Chammas" > wrote: > >> Looks like the other packages may also be corrupt. I’m getting the same >> error for the Spark 1.6.1 / Hadoop 2.4 package. >> >> >> https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz >&

Re: Performance improvements for sorted RDDs

2016-03-21 Thread Ted Yu
Do you have performance numbers to backup this proposal for cogroup operation ? Thanks On Mon, Mar 21, 2016 at 1:06 AM, JOAQUIN GUANTER GONZALBEZ < joaquin.guantergonzal...@telefonica.com> wrote: > Hello devs, > > > > I have found myself in a situation where Spark is doing sub-optimal > computat

Re: error occurs to compile spark 1.6.1 using scala 2.11.8

2016-03-22 Thread Ted Yu
>From the error message, it seems some artifacts from Scala 2.10.4 were left around. FYI maven 3.3.9 is required for master branch. On Tue, Mar 22, 2016 at 3:07 AM, Allen wrote: > Hi, > > I am facing an error when doing compilation from IDEA, please see the > attached. I fired the build process

Re: BlockManager WARNINGS and ERRORS

2016-03-27 Thread Ted Yu
The warning was added by: SPARK-12757 Add block-level read/write locks to BlockManager On Sun, Mar 27, 2016 at 12:24 PM, salexln wrote: > HI all, > > I started testing my code (https://github.com/salexln/FinalProject_FCM) > with the latest Spark available in GitHub, > and when I run it I get th

Re: OOM and "spark.buffer.pageSize"

2016-03-28 Thread Ted Yu
I guess you have looked at MemoryManager#pageSizeBytes where the "spark.buffer.pageSize" config can override default page size. FYI On Mon, Mar 28, 2016 at 12:07 PM, Steve Johnston < sjohns...@algebraixdata.com> wrote: > I'm attempting to address an OOM issue. I saw referenced in > java.lang.Out

explain codegen

2016-04-03 Thread Ted Yu
Hi, Based on master branch refreshed today, I issued 'git clean -fdx' first. Then this command: build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.0 package -DskipTests I got the following error: scala> sql("explain codegen select 'a' as a group by 1").head org.

Re: explain codegen

2016-04-04 Thread Ted Yu
. > > > On Sun, Apr 3, 2016 at 9:38 PM, Jacek Laskowski wrote: > >> Hi, >> >> Looks related to the recent commit... >> >> Repository: spark >> Updated Branches: >> refs/heads/master 2262a9335 -> 1f0c5dceb >> >> [SPARK-14350][

Re: explain codegen

2016-04-04 Thread Ted Yu
gards, > > Herman van Hövell > > 2016-04-04 12:15 GMT+02:00 Ted Yu : > >> Could the error I encountered be due to missing import(s) of implicit ? >> >> Thanks >> >> On Sun, Apr 3, 2016 at 9:42 PM, Reynold Xin wrote: >> >>> Works fo

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Ted Yu
bq. the modifications do not touch the scheduler If the changes can be ported over to 1.6.1, do you mind reproducing the issue there ? I ask because master branch changes very fast. It would be good to narrow the scope where the behavior you observed started showing. On Mon, Apr 4, 2016 at 6:12

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Ted Yu
; > >>> > On Fri, Mar 18, 2016 at 6:11 PM Jakob Odersky >>> wrote: >>> >> >>> >> I just experienced the issue, however retrying the download a second >>> >> time worked. Could it be that there is some load balancer/c

Re: explain codegen

2016-04-04 Thread Ted Yu
on't you wipe everything out and try again? > > On Monday, April 4, 2016, Ted Yu wrote: > >> The commit you mentioned was made Friday. >> I refreshed workspace Sunday - so it was included. >> >> Maybe this was related: >> >> $ bin/spark-shell >&g

Re: error: reference to sql is ambiguous after import org.apache.spark._ in shell?

2016-04-04 Thread Ted Yu
Looks like the import comes from repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala : processLine("import sqlContext.sql") On Mon, Apr 4, 2016 at 5:16 PM, Jacek Laskowski wrote: > Hi Spark devs, > > I'm unsure if what I'm seeing is correct. I'd appreciate any input > to

Re: Build with Thrift Server & Scala 2.11

2016-04-05 Thread Ted Yu
Raymond: Did "namenode" appear in any of the Spark config files ? BTW Scala 2.11 is used by the default build. On Tue, Apr 5, 2016 at 6:22 AM, Raymond Honderdors < raymond.honderd...@sizmek.com> wrote: > I can see that the build is successful > > (-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phi

Re: [STREAMING] DStreamClosureSuite.scala with { return; ssc.sparkContext.emptyRDD[Int] } Why?!

2016-04-05 Thread Ted Yu
The next line should give some clue: expectCorrectException { ssc.transform(Seq(ds), transformF) } Closure shouldn't include return. On Tue, Apr 5, 2016 at 3:40 PM, Jacek Laskowski wrote: > Hi, > > In > https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/st

Re: Updating Spark PR builder and 2.x test jobs to use Java 8 JDK

2016-04-05 Thread Ted Yu
Josh: You may have noticed the following error ( https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/566/console ): [error] javac: invalid source release: 1.8 [error] Usage: javac [error] use -help for a list of possible options On Tue, Apr 5, 2016 at 2:14 PM, Josh Ro

Re: BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Ted Yu
Looking at recent https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7 builds, there was no such error. I don't see anything wrong with the code: usage = "_FUNC_(str) - " + "Returns str, with the first letter of each word in uppercase, all other letters in " + Mind

Re: BROKEN BUILD? Is this only me or not?

2016-04-05 Thread Ted Yu
i > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Tue, Apr 5, 2016 at 8:41 PM, Ted Yu wrote: > > Looking at recent > > > https://amplab.cs.berkeley.e

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-06 Thread Ted Yu
Front >>> (i.e. >>> >> the “direct download” option on spark.apache.org) are also corrupt. >>> >> >>> >> Btw what’s the correct way to verify the SHA of a Spark package? I’ve >>> tried >>> >> a few commands on working pack

<    1   2   3   4   >