Re: How to clear the temp files that gets created by shuffle in Spark Streaming

2015-11-18 Thread Ted Yu
Have you seen SPARK-5836 ? Note TD's comment at the end. Cheers On Wed, Nov 18, 2015 at 7:28 PM, swetha wrote: > Hi, > > We have a lot of temp files that gets created due to shuffles caused by > group by. How to clear the files that gets created due to intermediate >

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread Ted Yu
I am a bit curious: Hbase depends on hdfs. Has hdfs support for Mesos been fully implemented ? Last time I checked, there was still work to be done. Thanks > On Nov 17, 2015, at 1:06 AM, 임정택 wrote: > > Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of

Re: ISDATE Function

2015-11-17 Thread Ted Yu
ISDATE() is currently not supported. Since it is SQL Server specific, I guess it wouldn't be added to Spark. On Mon, Nov 16, 2015 at 10:46 PM, Ravisankar Mani wrote: > Hi Everyone, > > > In MSSQL server suppprt "ISDATE()" function is used to fine current > column values date

Re: how can evenly distribute my records in all partition

2015-11-17 Thread Ted Yu
Please take a look at the following for example: ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala ./core/src/main/scala/org/apache/spark/Partitioner.scala Cheers On Tue, Nov 17, 2015 at 9:24 AM, prateek arora wrote: > Hi > Thanks > I am new

Re: spark with breeze error of NoClassDefFoundError

2015-11-17 Thread Ted Yu
Looking in local maven repo, breeze_2.10-0.7.jar contains DefaultArrayValue : jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep !$ jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep DefaultArrayValue 369 Wed Mar

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread Ted Yu
ter and Mesos cluster for some reasons, and > I just can make it work via spark-submit or spark-shell / zeppelin with > newly initialized SparkContext. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 2015-11-17 22:17 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: > >> I am a b

Re: Spark build error

2015-11-17 Thread Ted Yu
Is the Scala version in Intellij the same as the one used by sbt ? Cheers On Tue, Nov 17, 2015 at 6:45 PM, 金国栋 wrote: > Hi! > > I tried to build spark source code from github, and I successfully built > it from command line using `*sbt/sbt assembly*`. While I encountered an >

Re: Issue while Spark Job fetching data from Cassandra DB

2015-11-17 Thread Ted Yu
Have you considered polling Cassandra mailing list ? A brief search led to CASSANDRA-7894 FYI On Tue, Nov 17, 2015 at 7:24 PM, satish chandra j wrote: > HI All, > I am getting "*.UnauthorizedException: User has no SELECT > permission on or any of its parents*"

Re: Invocation of StreamingContext.stop() hangs in 1.5

2015-11-17 Thread Ted Yu
I don't think you should call ssc.stop() in StreamingListenerBus thread. Please stop the context asynchronously. BTW I have a pending PR: https://github.com/apache/spark/pull/9741 On Tue, Nov 17, 2015 at 1:50 PM, jiten wrote: > Hi, > > We're using Spark 1.5 streaming.

Re: spark-submit stuck and no output in console

2015-11-16 Thread Ted Yu
Which release of Spark are you using ? Can you take stack trace and pastebin it ? Thanks On Mon, Nov 16, 2015 at 5:50 AM, Kayode Odeyemi wrote: > ./spark-submit --class com.migration.UpdateProfiles --executor-memory 8g > ~/migration-profiles-0.1-SNAPSHOT.jar > > is stuck

Re: YARN Labels

2015-11-16 Thread Ted Yu
There is no such configuration parameter for selecting which nodes the application master is running on. Cheers On Mon, Nov 16, 2015 at 12:52 PM, Alex Rovner wrote: > I was wondering if there is analogues configuration parameter to >

Re: YARN Labels

2015-11-16 Thread Ted Yu
Wangda, YARN committer, told me that support for selecting which nodes the application master is running on is integrated to the upcoming hadoop 2.8.0 release. Stay tuned. On Mon, Nov 16, 2015 at 1:36 PM, Ted Yu <yuzhih...@gmail.com> wrote: > There is no such configuration

Re: Spark SQL: filter if column substring does not contain a string

2015-11-15 Thread Ted Yu
Please take a look at test_column_operators in python/pyspark/sql/tests.py FYI On Sat, Nov 14, 2015 at 11:49 PM, YaoPau wrote: > I'm using pyspark 1.3.0, and struggling with what should be simple. > Basically, I'd like to run this: > > site_logs.filter(lambda r: 'page_row'

Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map

Re: Very slow startup for jobs containing millions of tasks

2015-11-15 Thread Ted Yu
> > Sent from my iPhone > >> On 14 Nov, 2015, at 11:21 pm, Ted Yu <yuzhih...@gmail.com> wrote: >> >> Which release are you using ? >> If older than 1.5.0, you miss some fixes such as SPARK-9952 >> >> Cheers >> >>> On S

Re: Very slow startup for jobs containing millions of tasks

2015-11-14 Thread Ted Yu
Which release are you using ? If older than 1.5.0, you miss some fixes such as SPARK-9952 Cheers On Sat, Nov 14, 2015 at 6:35 PM, Jerry Lam wrote: > Hi spark users and developers, > > Have anyone experience the slow startup of a job when it contains a stage > with over 4

Re: a way to allow spark job to continue despite task failures?

2015-11-13 Thread Ted Yu
I searched the code base and looked at: https://spark.apache.org/docs/latest/running-on-yarn.html I didn't find mapred.max.map.failures.percent or its counterpart. FYI On Fri, Nov 13, 2015 at 9:05 AM, Nicolae Marasoiu < nicolae.maras...@adswizz.com> wrote: > Hi, > > > I know a task can fail 2

Re: problem with spark.unsafe.offHeap & spark.sql.tungsten.enabled

2015-11-12 Thread Ted Yu
I tried with master branch. scala> sc.getConf.getAll.foreach(println) (spark.executor.id,driver) (spark.driver.memory,16g) (spark.unsafe.offHeap,true) (spark.driver.host,172.18.128.12) (spark.repl.class.uri,http://172.18.128.12:59780) (spark.sql.tungsten.enabled,true)

Re: NullPointerException with joda time

2015-11-12 Thread Ted Yu
will stick with this solution for the moment even if I find java Date > ugly. > > Thanks for your help. > > 2015-11-11 15:54 GMT+01:00 Ted Yu <yuzhih...@gmail.com>: > >> In case you need to adjust log4j properties, see the following thread: >> >> >> http

Re: NullPointerException with joda time

2015-11-11 Thread Ted Yu
In case you need to adjust log4j properties, see the following thread: http://search-hadoop.com/m/q3RTtJHkzb1t0J66=Re+Spark+Streaming+Log4j+Inside+Eclipse Cheers On Tue, Nov 10, 2015 at 1:28 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I took a look at > https://github.com/JodaOrg/jo

Re: Start python script with SparkLauncher

2015-11-11 Thread Ted Yu
Please take a look at launcher/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java to see how app.getInputStream() and app.getErrorStream() are handled. In master branch, the Suite is located at core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java FYI On Wed, Nov 11,

Re: how to run unit test for specific component only

2015-11-11 Thread Ted Yu
Have you tried the following ? build/sbt "sql/test-only *" Cheers On Wed, Nov 11, 2015 at 7:13 PM, weoccc wrote: > Hi, > > I am wondering how to run unit test for specific spark component only. > > mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none > > The above

Re: Status of 2.11 support?

2015-11-11 Thread Ted Yu
n*. > > regards, > --Jakob > > *I'm myself pretty new to the Spark community so please don't take my > words on it as gospel > > > On 11 November 2015 at 15:25, Ted Yu <yuzhih...@gmail.com> wrote: > >> For #1, the published jars are usable. >> Howeve

Re: Creating new Spark context when running in Secure YARN fails

2015-11-11 Thread Ted Yu
Looks like the delegation token should be renewed. Mind trying the following ? Thanks diff --git a/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala b/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerB index 20771f6..e3c4a5a 100644

Re: Anybody hit this issue in spark shell?

2015-11-11 Thread Ted Yu
expose this issue in the PR build. Because > SBT build doesn't do the shading, now it's hard for us to find similar > issues in the PR build. > > Best Regards, > Shixiong Zhu > > 2015-11-09 18:47 GMT-08:00 Ted Yu <yuzhih...@gmail.com>: > >> Created https://githu

Re: Creating new Spark context when running in Secure YARN fails

2015-11-11 Thread Ted Yu
but am having the same problem. > > I ran: > > ./bin/pyspark --master yarn-client > > >> sc.stop() > >> sc = SparkContext() > > Same error dump as below. > > Do I need to pass something to the new sparkcontext ? > > Thanks, > Mike > > [imag

Re: Creating new Spark context when running in Secure YARN fails

2015-11-11 Thread Ted Yu
'true'), (u'spark.ssl.trustStore', > u'xxx.truststore')] > > I am not really familiar with "spark.yarn.credentials.file" and had > thought it was created automatically after communicating with YARN to get > tokens. > > Thanks, > Mike > > > [image: Inactive hide details for Ted Yu ---11/1

Re: Status of 2.11 support?

2015-11-11 Thread Ted Yu
For #1, the published jars are usable. However, you should build from source for your specific combination of profiles. Cheers On Wed, Nov 11, 2015 at 3:22 PM, shajra-cogscale wrote: > Hi, > > My company isn't using Spark in production yet, but we are using a bit of

Re: NullPointerException with joda time

2015-11-10 Thread Ted Yu
Can you show the stack trace for the NPE ? Which release of Spark are you using ? Cheers On Tue, Nov 10, 2015 at 8:20 AM, romain sagean wrote: > Hi community, > I try to apply the function below during a flatMapValues or a map but I > get a nullPointerException with the

Re: NullPointerException with joda time

2015-11-10 Thread Ted Yu
at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1400) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1361) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > 15/11/

Re: Anybody hit this issue in spark shell?

2015-11-10 Thread Ted Yu
n the PR build. > > Best Regards, > Shixiong Zhu > > 2015-11-09 18:47 GMT-08:00 Ted Yu <yuzhih...@gmail.com>: > >> Created https://github.com/apache/spark/pull/9585 >> >> Cheers >> >> On Mon, Nov 9, 2015 at 6:39 PM, Josh Rosen <joshro...@databric

Re: Spark IndexedRDD dependency in Maven

2015-11-09 Thread Ted Yu
I would suggest asking this question on SPARK-2365 since IndexedRDD has not been released (upstream) Cheers On Mon, Nov 9, 2015 at 1:34 PM, swetha wrote: > > Hi , > > What is the appropriate dependency to include for Spark Indexed RDD? I get > compilation error if I

Re: Anybody hit this issue in spark shell?

2015-11-09 Thread Ted Yu
9, 2015 at 6:13 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Yeah, we should probably remove that. >> >> On Mon, Nov 9, 2015 at 5:54 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> If there is no option to let shell skip processing @Visi

Re: Anybody hit this issue in spark shell?

2015-11-09 Thread Ted Yu
If there is no option to let shell skip processing @VisibleForTesting , should the annotation be dropped ? Cheers On Mon, Nov 9, 2015 at 5:50 PM, Marcelo Vanzin wrote: > We've had this in the past when using "@VisibleForTesting" in classes > that for some reason the shell

Re: Anybody hit this issue in spark shell?

2015-11-09 Thread Ted Yu
Which branch did you perform the build with ? I used the following command yesterday: mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.4 -Dhadoop.version=2.7.0 package -DskipTests Spark shell was working. Building with latest master branch. On Mon, Nov 9, 2015 at 10:37 AM, Zhan Zhang

Re: Anybody hit this issue in spark shell?

2015-11-09 Thread Ted Yu
I backtracked to: ef362846eb448769bcf774fc9090a5013d459464 The issue was still there. FYI On Mon, Nov 9, 2015 at 10:46 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Which branch did you perform the build with ? > > I used the following command yesterday: > mvn -Phive -Phive-t

Re: parquet.io.ParquetEncodingException Warning when trying to save parquet file in Spark

2015-11-09 Thread Ted Yu
Please see https://issues.apache.org/jira/browse/PARQUET-124 > On Nov 8, 2015, at 11:43 PM, swetha wrote: > > Hi, > > I see unwanted Warning when I try to save a Parquet file in hdfs in Spark. > Please find below the code and the Warning message. Any idea as to how

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Ted Yu
Please consider using NoSQL engine such as hbase. Cheers > On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic >

Re: Spark Job failing with exit status 15

2015-11-08 Thread Ted Yu
Which release of Spark were you using ? Can you post the command you used to run WordCount ? Cheers On Sat, Nov 7, 2015 at 7:59 AM, Shashi Vishwakarma wrote: > I am trying to run simple word count job in spark but I am getting > exception while running job. > > For

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

2015-11-06 Thread Ted Yu
ypically, HiveContext has more functionality than SQLContext. In what case > you have to use SQLContext that cannot be done by HiveContext? > > Thanks. > > Zhan Zhang > > On Nov 6, 2015, at 10:43 AM, Jerry Lam <chiling...@gmail.com> wrote: > > What is interesting

Re: What is the efficient way to Join two RDDs?

2015-11-06 Thread Ted Yu
Can you tell us a bit more about your use case ? Are the two RDDs expected to be of roughly equal size or, to be of vastly different sizes ? Thanks On Fri, Nov 6, 2015 at 3:21 PM, swetha wrote: > Hi, > > What is the efficient way to join two RDDs? Would converting

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2015-11-06 Thread Ted Yu
You mentioned resourcemanager but not nodemanagers. I think you need to install Spark on nodes running nodemanagers. Cheers On Fri, Nov 6, 2015 at 1:32 PM, Kayode Odeyemi wrote: > Hi, > > I have a YARN hadoop setup of 8 nodes (7 datanodes, 1 namenode and > resourcemaneger).

Re: kerberos question

2015-11-04 Thread Ted Yu
2015-11-04 10:03:31,905 ERROR [Delegation Token Refresh Thread-0] hdfs.KeyProviderCache (KeyProviderCache.java:createKeyProviderURI(87)) - Could not find uri with key [dfs. encryption.key.provider.uri] to create a keyProvider !! Could it be related to HDFS-7931 ? On Wed, Nov 4, 2015 at 12:30

Re: Executor app-20151104202102-0000 finished with state EXITED

2015-11-04 Thread Ted Yu
Have you tried using -Dspark.master=local ? Cheers On Wed, Nov 4, 2015 at 10:47 AM, Kayode Odeyemi wrote: > Hi, > > I can't seem to understand why all created executors always fail. > > I have a Spark standalone cluster setup make up of 2 workers and 1 master. > My spark-env

Re: Executor app-20151104202102-0000 finished with state EXITED

2015-11-04 Thread Ted Yu
s: > > conf.setMaster("spark://192.168.2.11:7077") > conf.set("spark.logConf", "true") > conf.set("spark.akka.logLifecycleEvents", "true") > conf.set("spark.executor.memory", "5g") > > On Wed, Nov 4, 2015 at 9:04 PM

Re: Allow multiple SparkContexts in Unit Testing

2015-11-04 Thread Ted Yu
Are you trying to speed up tests where each test suite uses single SparkContext ? You may want to read: https://issues.apache.org/jira/browse/SPARK-2243 Cheers On Wed, Nov 4, 2015 at 4:59 AM, Priya Ch wrote: > Hello All, > > How to use multiple Spark Context in

Re: Improve parquet write speed to HDFS and spark.sql.execution.id is already set ERROR

2015-11-03 Thread Ted Yu
I am a bit curious: why is the synchronization on finalLock is needed ? Thanks > On Oct 23, 2015, at 8:25 AM, Anubhav Agarwal wrote: > > I have a spark job that creates 6 million rows in RDDs. I convert the RDD > into Data-frame and write it to HDFS. Currently it takes 3

Re: error with saveAsTextFile in local directory

2015-11-03 Thread Ted Yu
Looks like you were running 1.4.x or earlier release because the allowLocal flag is deprecated as of Spark 1.5.0+. Cheers On Tue, Nov 3, 2015 at 3:07 PM, Jack Yang wrote: > Hi all, > > > > I am saving some hive- query results into the local directory: > > > > val hdfsFilePath

Re: How to enable debug in Spark Streaming?

2015-11-03 Thread Ted Yu
Take a look at: http://search-hadoop.com/m/q3RTtxRM5d2SLnmQ1=Re+Override+Logging+with+spark+streaming On Tue, Nov 3, 2015 at 5:29 AM, diplomatic Guru wrote: > I have an issue with a Spark Streaming job that appears to be running but > not producing any results.

Re: Required file not found: sbt-interface.jar

2015-11-02 Thread Ted Yu
sbt-interface.jar is under build/zinc-0.3.5.3/lib/sbt-interface.jar You can run build/mvn first to download it. Cheers On Mon, Nov 2, 2015 at 1:51 AM, Todd wrote: > Hi, > I am trying to build spark 1.5.1 in my environment, but encounter the > following error complaining

Re: How to lookup by a key in an RDD

2015-11-02 Thread Ted Yu
Please take a look at SPARK-2365 On Mon, Nov 2, 2015 at 3:25 PM, swetha kasireddy wrote: > Hi, > > Is Indexed RDDs released yet? > > Thanks, > Swetha > > On Sun, Nov 1, 2015 at 1:21 AM, Gylfi wrote: > >> Hi. >> >> You may want to look into Indexed

Re: apply simplex method to fix linear programming in spark

2015-11-01 Thread Ted Yu
A brief search in code base shows the following: TODO: Add simplex constraints to allow alpha in (0,1). ./mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala I guess the answer to your question is no. FYI On Sun, Nov 1, 2015 at 9:37 AM, Zhiliang Zhu

Re: job hangs when using pipe() with reduceByKey()

2015-10-31 Thread Ted Yu
Which Spark release are you using ? Which OS ? Thanks On Sat, Oct 31, 2015 at 5:18 AM, hotdog wrote: > I meet a situation: > When I use > val a = rdd.pipe("./my_cpp_program").persist() > a.count() // just use it to persist a > val b = a.map(s => (s,

Re: Sorry, but Nabble and ML suck

2015-10-31 Thread Ted Yu
>From the result of http://search-hadoop.com/?q=spark+Martin+Senne , Martin's post Tuesday didn't go through. FYI On Sat, Oct 31, 2015 at 9:34 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Nabble is an unofficial archive of this mailing list. I don't know who > runs it, but it's

Re: Whether Spark will use disk when the memory is not enough on MEMORY_ONLY Storage Level

2015-10-30 Thread Ted Yu
Jone: For #3, consider ask on vendor's mailing list. On Fri, Oct 30, 2015 at 7:11 AM, Akhil Das wrote: > You can set it to MEMORY_AND_DISK, in this case data will fall back to > disk when it exceeds the memory. > > Thanks > Best Regards > > On Fri, Oct 23, 2015 at

Re: how to merge two dataframes

2015-10-30 Thread Ted Yu
How about the following ? scala> df.registerTempTable("df") scala> df1.registerTempTable("df1") scala> sql("select customer_id, uri, browser, epoch from df union select customer_id, uri, browser, epoch from df1").show() +---+-+---+-+ |customer_id|

Re: key not found: sportingpulse.com in Spark SQL 1.5.0

2015-10-30 Thread Ted Yu
I searched for sportingpulse in *.scala and *.java files under 1.5 branch. There was no hit. mvn dependency doesn't show sportingpulse either. Is it possible this is specific to EMR ? Cheers On Fri, Oct 30, 2015 at 2:57 PM, Zhang, Jingyu wrote: > There is not a

Re: how to merge two dataframes

2015-10-30 Thread Ted Yu
a:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:221) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:242) > > > On Fri, Oct 30, 2015 at 3:34 PM, Ted Yu <yuzhih...@gmail.com> wrote: > &

Re: SparkLauncher is blocked until main process is killed.

2015-10-29 Thread Ted Yu
Which Spark release are you using ? Please note the typo in email subject (corrected as of this reply) On Thu, Oct 29, 2015 at 7:00 PM, Jey Kottalam wrote: > Could you please provide the jstack output? That would help the devs > identify the blocking operation more

Re: Building spark-1.5.x and MQTT

2015-10-28 Thread Ted Yu
MQTTUtils.class is generated from external/mqtt/src/main/scala/org/apache/spark/streaming/mqtt/MQTTUtils.scala What command did you use to build ? Which release / branch were you building ? Thanks On Wed, Oct 28, 2015 at 6:19 AM, Bob Corsaro wrote: > Has anyone successful

Re: Building spark-1.5.x and MQTT

2015-10-28 Thread Ted Yu
buntu boxen and a gentoo box. > > On Wed, Oct 28, 2015 at 9:59 AM Ted Yu <yuzhih...@gmail.com> wrote: > >> MQTTUtils.class is generated from >> external/mqtt/src/main/scala/org/apache/spark/streaming/mqtt/MQTTUtils.scala >> >> What command did you use to build

Re: spark to hbase

2015-10-27 Thread Ted Yu
Jinghong: Hadmin variable is not used. You can omit that line. Which hbase release are you using ? As Deng said, don't flush per row. Cheers > On Oct 27, 2015, at 3:21 AM, Deng Ching-Mallete wrote: > > Hi, > > It would be more efficient if you configure the table and

Re: Maven build failed (Spark master)

2015-10-27 Thread Ted Yu
I've ran this on both OSX Lion and Ubuntu 12. Same error. No .gz file > >> On Mon, Oct 26, 2015 at 9:10 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> Looks like '-Pyarn' was missing in your command. >> >>> On Mon, Oct 26, 2015 at 12:06 PM, Kayode Odeyemi <drey...@gmai

Re: Maven build failed (Spark master)

2015-10-27 Thread Ted Yu
/docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-spark-latest' > cp: cannot create directory > `/home/emperor/javaprojects/spark/spark-[WARNING] See > http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-spark-latest': > No such file or directory > > > On Tue, Oct 27, 2015 at 2

Re: spark to hbase

2015-10-27 Thread Ted Yu
Jinghong: In one of earlier threads on storing data to hbase, it was found that htrace jar was not on classpath, leading to write failure. Can you check whether you are facing the same problem ? Cheers On Tue, Oct 27, 2015 at 5:11 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Jinghong:

Re: spark to hbase

2015-10-27 Thread Ted Yu
ed-site.xml, yarn-default.xml, > yarn-site.xml, hdfs-default.xml, hdfs-site.xml, hbase-default.xml, > hbase-site.xml) > - field (class: com.chencai.spark.ml.TrainModel3$$anonfun$train$5, > name: configuration$1, type: class org.apache.hadoop.conf.Configuration) > - object (class > com.che

Re: Maven build failed (Spark master)

2015-10-26 Thread Ted Yu
pport/sql/parquet_partitioned/year=2015/month=9/day=1: > No such file or directory > cp: /usr/local/spark-latest/spark-[WARNING] See > http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin-bin-spark-latest/python/test_support/sql/parquet_partitioned/year=2015/month=9/day=1/.part-r-7.

Re: Dynamic Resource Allocation with Spark Streaming (Standalone Cluster, Spark 1.5.1)

2015-10-26 Thread Ted Yu
This is related: SPARK-10955 Warn if dynamic allocation is enabled for Streaming jobs which went into 1.6.0 as well. FYI On Mon, Oct 26, 2015 at 2:26 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Hi Matthias, > > Unless there was a change in 1.5, I'm afraid dynamic resource

Re: Maven build failed (Spark master)

2015-10-26 Thread Ted Yu
If you use the command shown in: https://github.com/apache/spark/pull/9281 You should have got the following: ./dist/python/test_support/sql/parquet_partitioned/year=2014/month=9/day=1/part-r-8.gz.parquet

Re: Error Compiling Spark 1.4.1 w/ Scala 2.11 & Hive Support

2015-10-26 Thread Ted Yu
Scala 2.11 is supported in 1.5.1 release: http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22 Can you upgrade ? Cheers On Mon, Oct 26, 2015 at 6:01 AM, Bryan Jeffrey wrote: > All, > > I'm seeing the following error compiling Spark 1.4.1 w/ Scala

Re: Problem with make-distribution.sh

2015-10-26 Thread Ted Yu
I logged SPARK-11318 with a PR. I verified that by adding -Phive the datanucleus jars are included: tar tzvf spark-1.6.0-SNAPSHOT-bin-custom-spark.tgz | grep datanucleus -rw-r--r-- hbase/hadoop 1890075 2015-10-26 09:52 spark-1.6.0-SNAPSHOT-bin-custom-spark/lib/datanucleus-core-3.2.10.jar

Re: rdd conversion

2015-10-26 Thread Ted Yu
bq. t = new Tuple2 (entry.getKey(), entry.getValue()); The return statement is outside the loop. That was why you got one RDD. On Mon, Oct 26, 2015 at 9:40 AM, Yasemin Kaya wrote: > Hi, > > I have *JavaRDD>>* and I want to >

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ted Yu
In zipRLibraries(): // create a zip file from scratch, do not append to existing file. val zipFile = new File(dir, name) I guess instead of creating sparkr.zip in the same directory as R lib, the zip file can be created under some directory writable by the user launching the app and

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Ted Yu
A dependency couldn't be downloaded: [INFO] +- com.h2database:h2:jar:1.4.183:test Have you checked your network settings ? Cheers On Sun, Oct 25, 2015 at 10:22 AM, Bilinmek Istemiyor wrote: > Thank you for the quick reply. You are God Send. I have long not been >

Re: Spark scala REPL - Unable to create sqlContext

2015-10-25 Thread Ted Yu
Have you taken a look at the fix for SPARK-11000 which is in the upcoming 1.6.0 release ? Cheers On Sun, Oct 25, 2015 at 8:42 AM, Yao wrote: > I have not been able to start Spark scala shell since 1.5 as it was not > able > to create the sqlContext during the startup. It

Re: Error building Spark on Windows with sbt

2015-10-25 Thread Ted Yu
If you have a pull request, Jenkins can test your change for you. FYI > On Oct 25, 2015, at 12:43 PM, Richard Eggert wrote: > > Also, if I run the Maven build on Windows or Linux without setting > -DskipTests=true, it hangs indefinitely when it gets to >

Re: question about HadoopFsRelation

2015-10-24 Thread Ted Yu
The code below was introduced by SPARK-7673 / PR #6225 See item #1 in the description of the PR. Cheers On Sat, Oct 24, 2015 at 12:59 AM, Koert Kuipers wrote: > the code that seems to flatMap directories to all the files inside is in > the private

Re: Stream are not serializable

2015-10-23 Thread Ted Yu
Mind sharing your code, if possible ? Thanks On Fri, Oct 23, 2015 at 9:49 AM, crakjie wrote: > Hello. > > I have activated the file checkpointing for a DStream to unleach the > updateStateByKey. > My unit test worked with no problem but when I have integrated this in my > full

Re: unsubscribe

2015-10-23 Thread Ted Yu
Take a look at first section of https://spark.apache.org/community On Fri, Oct 23, 2015 at 1:46 PM, wrote: > This e-mail and any files transmitted with it are for the sole use of the > intended recipient(s) and may contain confidential and privileged >

Re: get host from rdd map

2015-10-23 Thread Ted Yu
Can you outline your use case a bit more ? Do you want to know all the hosts which would run the map ? Cheers On Fri, Oct 23, 2015 at 5:16 PM, weoccc wrote: > in rdd map function, is there a way i can know the list of host names > where the map runs ? any code sample would

Re: Spark issue running jar on Linux vs Windows

2015-10-22 Thread Ted Yu
RemoteActorRefProvider is in akka-remote_2.10-2.3.11.jar jar tvf ~/.m2/repository/com/typesafe/akka/akka-remote_2.10/2.3.11/akka-remote_2.10-2.3.11.jar | grep RemoteActorRefProvi 1761 Fri May 08 16:13:02 PDT 2015 akka/remote/RemoteActorRefProvider$$anonfun$5.class 1416 Fri May 08 16:13:02 PDT

Re: Spark issue running jar on Linux vs Windows

2015-10-22 Thread Ted Yu
RemoteActorRefProvider is in akka-remote_2.10-2.3.11.jar jar tvf ~/.m2/repository/com/typesafe/akka/akka-remote_2.10/2.3.11/akka-remote_2.10-2.3.11.jar | grep RemoteActorRefProvi 1761 Fri May 08 16:13:02 PDT 2015 akka/remote/RemoteActorRefProvider$$anonfun$5.class 1416 Fri May 08 16:13:02 PDT

Re: Can we add an unsubscribe link in the footer of every email?

2015-10-21 Thread Ted Yu
The number of occurrences of such incidence is low. I think currently we don't need to add the footer. I checked several other Apache projects whose user@ I subscribe to - there is no such footer. Cheers On Wed, Oct 21, 2015 at 7:38 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: >

Re: Spark_sql

2015-10-21 Thread Ted Yu
I don't think passing sqlContext to map() is supported. Can you describe your use case in more detail ? Why do you need to create a DataFrame inside the map() function ? Cheers On Wed, Oct 21, 2015 at 6:32 PM, Ajay Chander wrote: > Hi Everyone, > > I have a use case where

Re: Spark opening to many connection with zookeeper

2015-10-20 Thread Ted Yu
How many regions do your table have ? Which hbase release do you use ? Cheers On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora wrote: > Hi All , > > My spark job started reporting zookeeper errors after seeing the zkdumps > from Hbase master i realized that there are N

hbase refguide URL

2015-10-20 Thread Ted Yu
Hi, I couldn't access the following URL (404): http://hbase.apache.org/book.html The above is linked from http://hbase.apache.org Where can I find the refguide ? Thanks

Re: Spark opening to many connection with zookeeper

2015-10-20 Thread Ted Yu
<hora.a...@gmail.com> wrote: > One region > ------ > From: Ted Yu <yuzhih...@gmail.com> > Sent: ‎20-‎10-‎2015 15:01 > To: Amit Singh Hora <hora.a...@gmail.com> > Cc: user <user@spark.apache.org> > Subject: Re: Spark opening t

Re: Spark opening to many connection with zookeeper

2015-10-20 Thread Ted Yu
lt;hora.a...@gmail.com> > Sent: ‎20-‎10-‎2015 20:38 > To: Ted Yu <yuzhih...@gmail.com> > Cc: user <user@spark.apache.org> > Subject: RE: Spark opening to many connection with zookeeper > > I used that also but the number of connection goes on increasing started

Re: Problem building Spark

2015-10-20 Thread Ted Yu
On my Mac: $ ls -l ~/.m2/repository/org/antlr/antlr/3.2/antlr-3.2.jar -rw-r--r-- 1 tyu staff 895124 Dec 17 2013 /Users/tyu/.m2/repository/org/antlr/antlr/3.2/antlr-3.2.jar Looks like there might be network issue on your computer. Can you check ? Thanks On Tue, Oct 20, 2015 at 1:21 PM,

Re: unsubscribe

2015-10-20 Thread Ted Yu
Pete: Please don't mix unrelated email on the back of another thread. To unsubscribe, see first section of https://spark.apache.org/community On Tue, Oct 20, 2015 at 2:42 PM, Pete Zybrick wrote: > > >

Re: Get statistic result from RDD

2015-10-20 Thread Ted Yu
each ID, I need to > get the statistic information. > > > > Best > > Frank > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* Tuesday, October 20, 2015 3:12 PM > *To:* ChengBo > *Cc:* user > *Subject:* Re: Get statistic result from RDD > > >

Re: Get statistic result from RDD

2015-10-20 Thread Ted Yu
Your mapValues can emit a tuple. If p(0) is between 0 and 5, first component of tuple would be 1, second being 0. If p(0) is 6 or 7, first component of tuple would be 0, second being 1. You can use reduceByKey to sum up corresponding component. On Tue, Oct 20, 2015 at 1:33 PM, Shepherd

Re: How to take user jars precedence over Spark jars

2015-10-19 Thread Ted Yu
Have you tried the following options ? --conf spark.driver.userClassPathFirst=true --conf spark.executor. userClassPathFirst=true Cheers On Mon, Oct 19, 2015 at 5:07 AM, YiZhi Liu wrote: > I'm trying to read a Thrift object from SequenceFile, using > elephant-bird's

Re: How to take user jars precedence over Spark jars

2015-10-19 Thread Ted Yu
niBasedUnixGroupsMappingWithFallback not > org.apache.hadoop.security.GroupMappingServiceProvider) > > 2015-10-19 22:23 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > Have you tried the following options ? > > > > --conf spark.driver.userClassPathFirst=true --conf > > spark.execut

Re: new 1.5.1 behavior - exception on executor throws ClassNotFound on driver

2015-10-19 Thread Ted Yu
nnily enough, I have a repro that doesn't even use mysql so this seems > to be purely a classloader issue: > > source: http://pastebin.com/WMCMwM6T > 1.4.1: http://pastebin.com/x38DQY2p > 1.5.1: http://pastebin.com/DQd6k818 > > > > On Mon, Oct 19, 2015 at 11:51 AM, Te

Re: new 1.5.1 behavior - exception on executor throws ClassNotFound on driver

2015-10-19 Thread Ted Yu
The attachments didn't go through. Consider pastbebin'ning. Thanks On Mon, Oct 19, 2015 at 11:15 AM, gbop wrote: > I've been struggling with a particularly puzzling issue after upgrading to > Spark 1.5.1 from Spark 1.4.1. > > When I use the MySQL JDBC connector and an

Re: new 1.5.1 behavior - exception on executor throws ClassNotFound on driver

2015-10-19 Thread Ted Yu
tUvcBerd > > On Mon, Oct 19, 2015 at 11:18 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> The attachments didn't go through. >> >> Consider pastbebin'ning. >> >> Thanks >> >> On Mon, Oct 19, 2015 at 11:15 AM, gbop <lij.ta...@gmail.com> wrote:

Re: How to calculate row by now and output retults in Spark

2015-10-19 Thread Ted Yu
Under core/src/test/scala/org/apache/spark , you will find a lot of examples for map function. FYI On Mon, Oct 19, 2015 at 10:35 AM, Shepherd wrote: > Hi all, I am new in Spark and Scala. I have a question in doing > calculation. I am using "groupBy" to generate key value

Re: serialization error

2015-10-19 Thread Ted Yu
Attachments didn't go through. Mind using pastebin to show the code / error ? Thanks On Mon, Oct 19, 2015 at 3:01 PM, daze5112 wrote: > Hi having some problems with the piece of code I inherited: > > > > > the error messages i get are: > > > the code runs if i

Re: Spark SQL Exception: Conf non-local session path expected to be non-null

2015-10-19 Thread Ted Yu
A brief search led me to ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java : private static final String HDFS_SESSION_PATH_KEY = "_hive.hdfs.session.path"; ... public static Path getHDFSSessionPath(Configuration conf) { SessionState ss = SessionState.get(); if (ss ==

<    4   5   6   7   8   9   10   11   12   13   >