Re: SIGBUS (0xa) when using DataFrameWriter.insertInto

2018-10-27 Thread Ted Yu
I don't seem to find the log. Can you double check ? Thanks Original message From: alexzautke Date: 10/27/18 8:54 AM (GMT-08:00) To: user@spark.apache.org Subject: Re: SIGBUS (0xa) when using DataFrameWriter.insertInto Please also find attached a complete error log. -- Se

Re: error while submitting job

2018-09-29 Thread Ted Yu
Can you tell us the version of Spark and the connector you used ? Thanks  Original message From: yuvraj singh <19yuvrajsing...@gmail.com> Date: 9/29/18 10:42 PM (GMT-08:00) To: user@spark.apache.org Subject: error while submitting job Hi , i am getting this error please help

Re: OOM: Structured Streaming aggregation state not cleaned up properly

2018-05-19 Thread Ted Yu
Hi, w.r.t. ElementTrackingStore, since it is backed by KVStore, there should be other classes which occupy significant memory. Can you pastebin the top 10 entries among the heap dump ? Thanks

Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread Ted Yu
createStream() is still in external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala But it is not in external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaUtils.scala FYI On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud wrote: > Hello Team, > > I s

Re: Broken SQL Visualization?

2018-01-15 Thread Ted Yu
Did you include any picture ? Looks like the picture didn't go thru. Please use third party site.  Thanks Original message From: Tomasz Gawęda Date: 1/15/18 2:07 PM (GMT-08:00) To: d...@spark.apache.org, user@spark.apache.org Subject: Broken SQL Visualization? Hi, today I ha

Re: how to mention others in JIRA comment please?

2017-06-26 Thread Ted Yu
You can find the JIRA handle of the person you want to mention by going to a JIRA where that person has commented. e.g. you want to find the handle for Joseph. You can go to: https://issues.apache.org/jira/browse/SPARK-6635 and click on his name in comment: https://issues.apache.org/jira/secure/V

Re: the compile of spark stoped without any hints, would you like help me please?

2017-06-25 Thread Ted Yu
Does adding -X to mvn command give you more information ? Cheers On Sun, Jun 25, 2017 at 5:29 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all, > > Today I use new PC to compile SPARK. > At the beginning, it worked well. > But it stop at some point. > the content in consle is : > ==

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Does the storage handler provide bulk load capability ? Cheers > On Jan 25, 2017, at 3:39 AM, Amrit Jangid wrote: > > Hi chetan, > > If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. > > Try this if you

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
The references are vendor specific. Suggest contacting vendor's mailing list for your PR. My initial interpretation of HBase repository is that of Apache. Cheers On Wed, Jan 25, 2017 at 7:38 AM, Chetan Khatri wrote: > @Ted Yu, Correct but HBase-Spark module available at HBase re

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Though no hbase release has the hbase-spark module, you can find the backport patch on HBASE-14160 (for Spark 1.6) You can build the hbase-spark module yourself. Cheers On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri wrote: > Hello Spark Community Folks, > > Currently I am using HBase 1.2.4 and

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
processing is delivered to hbase. Cheers On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri wrote: > Ok, Sure will ask. > > But what would be generic best practice solution for Incremental load from > HBASE. > > On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu wrote: > >> I haven

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
I haven't used Gobblin. You can consider asking Gobblin mailing list of the first option. The second option would work. On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri wrote: > Hello Guys, > > I would like to understand different approach for Distributed Incremental > load from HBase, Is there

Re: namespace quota not take effect

2016-08-25 Thread Ted Yu
This question should have been posted to user@ Looks like you were using wrong config. See: http://hbase.apache.org/book.html#quota See 'Setting Namespace Quotas' section further down. Cheers On Tue, Aug 23, 2016 at 11:38 PM, W.H wrote: > hi guys > I am testing the hbase namespace quota at

Re: Attempting to accept an unknown offer

2016-08-17 Thread Ted Yu
me from a hive sql. There are other > similar jobs which work fine > > On Wed, Aug 17, 2016 at 8:52 AM, Ted Yu wrote: > >> Can you provide more information ? >> >> Were you running on YARN ? >> Which version of Spark are you using ? >> >> Was your job fail

Re: Attempting to accept an unknown offer

2016-08-17 Thread Ted Yu
Can you provide more information ? Were you running on YARN ? Which version of Spark are you using ? Was your job failing ? Thanks On Wed, Aug 17, 2016 at 8:46 AM, vr spark wrote: > > W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an > unknown offer b859f2f3-7484-482d-8c0d-3

Re: Undefined function json_array_to_map

2016-08-17 Thread Ted Yu
Can you show the complete stack trace ? Which version of Spark are you using ? Thanks On Wed, Aug 17, 2016 at 8:46 AM, vr spark wrote: > Hi, > I am getting error on below scenario. Please suggest. > > i have a virtual view in hive > > view name log_data > it has 2 columns > > query_map

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Ted Yu
t's a converted dataset of case classes to > dataframe. This is deterministically causing the error in Scala 2.11. > > Once I can get a deterministically breaking test without work code I will > try to file a Jira bug. > > On Tue, Aug 16, 2016, 04:17 Ted Yu wrote: > >> I t

Re: long lineage

2016-08-16 Thread Ted Yu
Have you tried periodic checkpoints ? Cheers > On Aug 16, 2016, at 5:50 AM, pseudo oduesp wrote: > > Hi , > how we can deal after raise stackoverflow trigger by long lineage ? > i mean i have this error and how resolve it wiyhout creating new session > thanks >

Re: class not found exception Logging while running JavaKMeansExample

2016-08-16 Thread Ted Yu
og4j and sl4j dependencies in pom. I am > still not getting what dependencies I am missing. > > Best Regards, > Subash Basnet > > On Mon, Aug 15, 2016 at 6:50 PM, Ted Yu wrote: > >> Logging has become private in 2.0 release: >> >> private[spark] tra

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Ted Yu
-15285 with master branch. > Should we reopen SPARK-15285? > > Best Regards, > Kazuaki Ishizaki, > > > > From:Ted Yu > To:dhruve ashar > Cc:Aris , "user@spark.apache.org" > > Date:2016/08/15 06:19 > Subject:

Re: class not found exception Logging while running JavaKMeansExample

2016-08-15 Thread Ted Yu
Logging has become private in 2.0 release: private[spark] trait Logging { On Mon, Aug 15, 2016 at 9:48 AM, subash basnet wrote: > Hello all, > > I am trying to run JavaKMeansExample of the spark example project. I am > getting the classnotfound exception error: > *Exception in thread "main" jav

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-14 Thread Ted Yu
Looks like the proposed fix was reverted: Revert "[SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grows beyond 64 KB" This reverts commit fa244e5a90690d6a31be50f2aa203ae1a2e9a1cf. Maybe this was fixed in some other JIRA ? On Fri, Aug 12, 2016 at 2:30 PM, dhruve ashar w

Re: Why I can't use broadcast var defined in a global object?

2016-08-13 Thread Ted Yu
Can you (or David) resend David's reply ? I don't see the reply in this thread. Thanks > On Aug 13, 2016, at 8:39 PM, yaochunnan wrote: > > Hi David, > Your answers have solved my problem! Detailed and accurate. Thank you very > much! > > > > -- > View this message in context: > http://a

Re: Single point of failure with Driver host crashing

2016-08-11 Thread Ted Yu
Have you read https://spark.apache.org/docs/latest/spark-standalone.html#high-availability ? FYI On Thu, Aug 11, 2016 at 12:40 PM, Mich Talebzadeh wrote: > > Hi, > > Although Spark is fault tolerant when nodes go down like below: > > FROM tmp > [Stage 1:===>

Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-08 Thread Ted Yu
lacesUnchanged.unionAll(placesAddedWithMerchantId). > unionAll(placesUpdatedFromHotelsWithMerchantId).unionAll(pla > cesUpdatedFromRestaurantsWithMerchantId).unionAll(placesChanged) > > I'm using Spark 1.6.2. > > On Mon, Aug 8, 2016 at 3:11 PM, Ted Yu wrote: > >&g

Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-08 Thread Ted Yu
Can you show the code snippet for unionAll operation ? Which Spark release do you use ? BTW please use user@spark.apache.org in the future. On Mon, Aug 8, 2016 at 11:47 AM, max square wrote: > Hey guys, > > I'm trying to save Dataframe in CSV format after performing unionAll > operations on it

Re: Multiple Sources Found for Parquet

2016-08-08 Thread Ted Yu
Can you examine classpath to see where *DefaultSource comes from ?* *Thanks* On Mon, Aug 8, 2016 at 2:34 AM, 金国栋 wrote: > I'm using Spark2.0.0 to do sql analysis over parquet files, when using > `read().parquet("path")`, or `write().parquet("path")` in Java(I followed > the example java file in

Re: submitting spark job with kerberized Hadoop issue

2016-08-07 Thread Ted Yu
The link in Jerry's response was quite old. Please see: http://hbase.apache.org/book.html#security Thanks On Sun, Aug 7, 2016 at 6:55 PM, Saisai Shao wrote: > 1. Standalone mode doesn't support accessing kerberized Hadoop, simply > because it lacks the mechanism to distribute delegation tokens

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread Ted Yu
can be made public in order to develop > custom transformers or any other alternatives ? > > On Sat, Aug 6, 2016 at 10:07 AM, Ted Yu wrote: > >> Is it because HasInputCol is private ? >> >> private[ml] trait HasInputCol extends Params { >> >> On Thu, Aug 4

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread Ted Yu
Is it because HasInputCol is private ? private[ml] trait HasInputCol extends Params { On Thu, Aug 4, 2016 at 1:18 PM, janardhan shetty wrote: > Version : 2.0.0-preview > > import org.apache.spark.ml.param._ > import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} > > > class Custom

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
thanks, > Carlo > > On 5 Aug 2016, at 17:58, Ted Yu wrote: > > private[spark] trait Logging { > > > -- The Open University is incorporated by Royal Charter (RC 000391), an > exempt charity in England & Wales and a charity registered in Scotland (SC > 03830

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
In 2.0, Logging became private: private[spark] trait Logging { FYI On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca wrote: > Dear All, > > I would like to ask for your help about the following issue: > java.lang.ClassNotFoundException: > org.apache.spark.Logging > > I checked and the class Loggi

Re: What is "Developer API " in spark documentation?

2016-08-05 Thread Ted Yu
See previous discussion : http://search-hadoop.com/m/q3RTtTvrPrc6O2h1&subj=Re+discuss+separate+API+annotation+into+two+components+InterfaceAudience+InterfaceStability > On Aug 5, 2016, at 2:55 AM, Aseem Bansal wrote: > > Hi > > Many of spark documentation say "Developer API". What does that me

Re: source code for org.spark-project.hive

2016-08-04 Thread Ted Yu
https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 FYI On Thu, Aug 4, 2016 at 6:23 AM, prabhat__ wrote: > hey > can anyone point me to the source code for the jars used with group-id > org.spark-project.hive. > This was previously maintained in the private repo of pwendell > (https://g

Re: how to debug spark app?

2016-08-03 Thread Ted Yu
Have you looked at: https://spark.apache.org/docs/latest/running-on-yarn.html#debugging-your-application If you use Mesos: https://spark.apache.org/docs/latest/running-on-mesos.html#troubleshooting-and-debugging On Wed, Aug 3, 2016 at 6:13 PM, glen wrote: > Any tool like gdb? Which support brea

Re: java.net.URISyntaxException: Relative path in absolute URI:

2016-08-03 Thread Ted Yu
SPARK-15899 ? On Wed, Aug 3, 2016 at 11:05 AM, Flavio wrote: > Hello everyone, > > I am try to run a very easy example but unfortunately I am stuck on the > follow exception: > > Exception in thread "main" java.lang.IllegalArgumentException: >

Re: Managed memory leak detected + OutOfMemoryError: Unable to acquire X bytes of memory, got 0

2016-08-03 Thread Ted Yu
Spark 2.0 has been released. Mind giving it a try :-) ? On Wed, Aug 3, 2016 at 9:11 AM, Rychnovsky, Dusan < dusan.rychnov...@firma.seznam.cz> wrote: > OK, thank you. What do you suggest I do to get rid of the error? > > > ---------- > *From:* Ted Yu

Re: Managed memory leak detected + OutOfMemoryError: Unable to acquire X bytes of memory, got 0

2016-08-03 Thread Ted Yu
> > > ---------- > *From:* Rychnovsky, Dusan > *Sent:* Wednesday, August 3, 2016 3:58 PM > *To:* Ted Yu > > *Cc:* user@spark.apache.org > *Subject:* Re: Managed memory leak detected + OutOfMemoryError: Unable to &g

Re: Managed memory leak detected + OutOfMemoryError: Unable to acquire X bytes of memory, got 0

2016-08-03 Thread Ted Yu
Are you using Spark 1.6+ ? See SPARK-11293 On Wed, Aug 3, 2016 at 5:03 AM, Rychnovsky, Dusan < dusan.rychnov...@firma.seznam.cz> wrote: > Hi, > > > I have a Spark workflow that when run on a relatively small portion of > data works fine, but when run on big data fails with strange errors. In the

Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Ted Yu
Using spark-shell of master branch: scala> case class Entry(id: Integer, name: String) defined class Entry scala> val df = Seq((1,"one"), (2, "two")).toDF("id", "name").as[Entry] 16/08/02 16:47:01 DEBUG package$ExpressionCanonicalizer: === Result of Batch CleanExpressions === !assertnotnull(inpu

Re: Extracting key word from a textual column

2016-08-02 Thread Ted Yu
+1 > On Aug 2, 2016, at 2:29 PM, Jörn Franke wrote: > > If you need to use single inserts, updates, deletes, select why not use hbase > with Phoenix? I see it as complementary to the hive / warehouse offering > >> On 02 Aug 2016, at 22:34, Mich Talebzadeh wrote: >> >> Hi, >> >> I decided t

Re: Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread Ted Yu
Which hadoop version are you using ? Can you show snippet of your code ? Thanks On Tue, Aug 2, 2016 at 10:06 AM, Liangzhao Zeng wrote: > Hi, > > > I migrate my code to Spark 2.0 from 1.6. It finish last stage (and result is > correct) but get following errors then start over. > > > Any idea o

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-08-01 Thread Ted Yu
My JDK is Java 1.8 u40 >> >> On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu wrote: >> >>> Since you specified +PrintGCDetails, you should be able to get some >>> more detail from the GC log. >>> >>> Also, which JDK version are you using ? >>&

Re: JettyUtils.createServletHandler Method not Found?

2016-08-01 Thread Ted Yu
Original discussion was about Spark 1.3 Which Spark release are you using ? Cheers On Mon, Aug 1, 2016 at 1:37 AM, bg_spark <1412743...@qq.com> wrote: > hello,I have the same problem like you, how do you solve the problem? > > > > -- > View this message in context: > http://apache-spark-user-l

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-23 Thread Ted Yu
Since you specified +PrintGCDetails, you should be able to get some more detail from the GC log. Also, which JDK version are you using ? Please use Java 8 where G1GC is more reliable. On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss wrote: > Hi, > > I added the following parameter: > > --conf "spa

Re: Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space

2016-07-22 Thread Ted Yu
How much heap memory do you give the driver ? On Fri, Jul 22, 2016 at 2:17 PM, Andy Davidson < a...@santacruzintegration.com> wrote: > Given I get a stack trace in my python notebook I am guessing the driver > is running out of memory? > > My app is simple it creates a list of dataFrames from s3:

Re: NoClassDefFoundError with ZonedDateTime

2016-07-21 Thread Ted Yu
he spark application > itself? > > On Thu, Jul 21, 2016 at 9:37 PM Ted Yu wrote: > >> Might be classpath issue. >> >> Mind pastebin'ning the effective class path ? >> >> Stack trace of NoClassDefFoundError may also help provide some clue. >

Re: NoClassDefFoundError with ZonedDateTime

2016-07-21 Thread Ted Yu
Might be classpath issue. Mind pastebin'ning the effective class path ? Stack trace of NoClassDefFoundError may also help provide some clue. On Thu, Jul 21, 2016 at 8:26 PM, Ilya Ganelin wrote: > Hello - I'm trying to deploy the Spark TimeSeries library in a new > environment. I'm running Spar

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Ted Yu
You can decide which component(s) to use for storing your data. If you haven't used hbase before, it may be better to store data on hdfs and query through Hive or SparkSQL. Maintaining hbase is not trivial task, especially when the cluster size is large. How much data are you expecting to be writ

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-19 Thread Ted Yu
hbase-spark module is in the up-coming hbase 2.0 release. Currently it is in master branch of hbase git repo. FYI On Tue, Jul 19, 2016 at 8:27 PM, Andrew Ehrlich wrote: > There is a Spark<->HBase library that does this. I used it once in a > prototype (never tried in production through): > htt

Re: Missing Exector Logs From Yarn After Spark Failure

2016-07-19 Thread Ted Yu
What's the value for yarn.log-aggregation.retain-seconds and yarn.log-aggregation-enable ? Which hadoop release are you using ? Thanks On Tue, Jul 19, 2016 at 3:23 PM, Rachana Srivastava < rachana.srivast...@markmonitor.com> wrote: > I am trying to find the root cause of recent Spark applicatio

Re: I'm trying to understand how to compile Spark

2016-07-19 Thread Ted Yu
org.apache.spark.mllib.fpm is not a maven goal. -pl is For Individual Projects. Your first build action should not include -pl. On Tue, Jul 19, 2016 at 4:22 AM, Eli Super wrote: > Hi > > I have a windows laptop > > I just downloaded the spark 1.4.1 source code. > > I try to compile *org.apach

Re: Spark ResourceLeak??

2016-07-19 Thread Ted Yu
ResourceLeakDetector doesn't seem to be from Spark. Please check dependencies for potential leak. Cheers On Tue, Jul 19, 2016 at 6:11 AM, Guruji wrote: > I am running a Spark Cluster on Mesos. The module reads data from Kafka as > DirectStream and pushes it into elasticsearch after referring t

Re: Input path does not exist error in giving input file for word count program

2016-07-15 Thread Ted Yu
>From >examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala : val lines = ssc.textFileStream(args(0)) val words = lines.flatMap(_.split(" ")) In your case, looks like inputfile didn't correspond to an existing path. On Fri, Jul 15, 2016 at 1:05 AM, RK Spark w

Re: Call http request from within Spark

2016-07-14 Thread Ted Yu
Second to what Pedro said in the second paragraph. Issuing http request per row would not scale. On Thu, Jul 14, 2016 at 12:26 PM, Pedro Rodriguez wrote: > Hi Amit, > > Have you tried running a subset of the IDs locally on a single thread? It > would be useful to benchmark your getProfile funct

Re: Issue in spark job. Remote rpc client dissociated

2016-07-13 Thread Ted Yu
Which Spark release are you using ? Can you disclose what the folder processing does (code snippet is better) ? Thanks On Wed, Jul 13, 2016 at 9:44 AM, Balachandar R.A. wrote: > Hello > > In one of my use cases, i need to process list of folders in parallel. I > used > Sc.parallelize (list,lis

Re: Optimize filter operations with sorted data

2016-07-07 Thread Ted Yu
Does the filter under consideration operate on sorted column(s) ? Cheers > On Jul 7, 2016, at 2:25 AM, tan shai wrote: > > Hi, > > I have a sorted dataframe, I need to optimize the filter operations. > How does Spark performs filter operations on sorted dataframe? > > It is scanning all the

Re: Saving parquet table as uncompressed with write.mode("overwrite").

2016-07-03 Thread Ted Yu
Have you tried the following (note the extraneous dot in your config name) ? val c = sqlContext.setConf("spark.sql.parquet.compression.codec", "none") Also, parquet() has compression parameter which defaults to None FYI On Sun, Jul 3, 2016 at 2:42 PM, Mich Talebzadeh wrote: > Hi, > > I simply

Re: Spark driver assigning splits to incorrect workers

2016-07-01 Thread Ted Yu
I guess you extended some InputFormat for providing locality information. Can you share some code snippet ? Which non-distributed file system are you using ? Thanks On Fri, Jul 1, 2016 at 2:46 PM, Raajen wrote: > I would like to use Spark on a non-distributed file system but am having > troub

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-06-30 Thread Ted Yu
Looking under Alluxio source, it seems only "fs.hdfs.impl.disable.cache" is in use. FYI On Thu, Jun 30, 2016 at 9:30 PM, Deepak Sharma wrote: > Ok. > I came across this issue. > Not sure if you already assessed this: > https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-6921 > > T

Re: Spark master shuts down when one of zookeeper dies

2016-06-30 Thread Ted Yu
But the master that was down , never comes up. > > Is this the expected ? Is there a way to get alert when a master is down ? > How to make sure that there is atleast one back up master is up always ? > > Thanks > Vimal > > > > > On Tue, Jun 28, 2016 at 7:24 PM

Metadata for the StructField

2016-06-29 Thread Ted Yu
You can specify Metadata for the StructField : case class StructField( name: String, dataType: DataType, nullable: Boolean = true, metadata: Metadata = Metadata.empty) { FYI On Wed, Jun 29, 2016 at 2:50 AM, pooja mehta wrote: > Hi, > > Want to add a metadata field to StructFiel

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu
._2 + > x._1.length) ? > > On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu wrote: > >> Please take a look at: >> core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala >> >> In compute() method: >> val split = splitIn.asInstanceOf[ZippedWi

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu
Please take a look at: core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala In compute() method: val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition] firstParent[T].iterator(split.prev, context).zipWithIndex.map { x => (x._1, split.startIndex + x._2) You can mo

Re: Spark master shuts down when one of zookeeper dies

2016-06-28 Thread Ted Yu
Please see some blog w.r.t. the number of nodes in the quorum: http://stackoverflow.com/questions/13022244/zookeeper-reliability-three-versus-five-nodes http://www.ibm.com/developerworks/library/bd-zookeeper/ the paragraph starting with 'A quorum is represented by a strict majority of nodes' F

Re: Utils and Logging cannot be accessed in package ....

2016-06-27 Thread Ted Yu
AFAICT Utils is private: private[spark] object Utils extends Logging { So is Logging: private[spark] trait Logging { FYI On Mon, Jun 27, 2016 at 8:20 AM, Paolo Patierno wrote: > Hello, > > I'm trying to use the Utils.createTempDir() method importing > org.apache.spark.util.Utils but the scal

Re: Arrays in Datasets (1.6.1)

2016-06-27 Thread Ted Yu
Can you show the stack trace for encoding error(s) ? Have you looked at the following test which involves NestedArray of primitive type ? ./sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala Cheers On Mon, Jun 27, 2016 at 8:50 AM, Daniel Imberman wr

Re: Logging trait in Spark 2.0

2016-06-24 Thread Ted Yu
See this related thread: http://search-hadoop.com/m/q3RTtEor1vYWbsW&subj=RE+Configuring+Log4J+Spark+1+5+on+EMR+4+1+ On Fri, Jun 24, 2016 at 6:07 AM, Paolo Patierno wrote: > Hi, > > developing a Spark Streaming custom receiver I noticed that the Logging > trait isn't accessible anymore in Spark

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Ted Yu
In Spark 2.0, Dataset and DataFrame are unified. Would this simplify your use case ? On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano wrote: > Hi, > > I'm exposing a custom source to the Spark environment. I have a question > about the best way to approach this problem. > > I created a custom r

Re: Kryo ClassCastException during Serialization/deserialization in Spark Streaming

2016-06-23 Thread Ted Yu
Can you illustrate how sampleMap is populated ? Thanks On Thu, Jun 23, 2016 at 12:34 PM, SRK wrote: > Hi, > > I keep getting the following error in my Spark Streaming every now and then > after the job runs for say around 10 hours. I have those 2 classes > registered in kryo as shown below. s

Re: Multiple compute nodes in standalone mode

2016-06-23 Thread Ted Yu
Have you looked at: https://spark.apache.org/docs/latest/spark-standalone.html On Thu, Jun 23, 2016 at 12:28 PM, avendaon wrote: > Hi all, > > I have a cluster that has multiple nodes, and the data partition is > unified, > therefore all my nodes in my computer can access to the data I am worki

Re: NullPointerException when starting StreamingContext

2016-06-22 Thread Ted Yu
Which Scala version / Spark release are you using ? Cheers On Wed, Jun 22, 2016 at 8:20 PM, Sunita Arvind wrote: > Hello Experts, > > I am getting this error repeatedly: > > 16/06/23 03:06:59 ERROR streaming.StreamingContext: Error starting the > context, marking it as stopped > java.lang.Null

Re: spark-1.6.1-bin-without-hadoop can not use spark-sql

2016-06-22 Thread Ted Yu
http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz> which > is a pre-built package on hadoop 2.7.2? > > > > ------ 原始邮件 -- > *发件人:* "Ted Yu";; > *发送时间:* 2016年6月22日(星期三) 晚上11:51 > *收件人:* "喜之郎"<251922...@qq.com>; > *

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Ted Yu
; regardless of its type. Intuition here is that count should take no > parameter. Or am I missing something? > > Jakub > > On Wed, Jun 22, 2016 at 6:19 PM, Ted Yu wrote: > >> Are you referring to the following method in >> sql/core/src/main/scala/org/apache/spark/sql/

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Ted Yu
Are you referring to the following method in sql/core/src/main/scala/org/apache/spark/sql/functions.scala : def count(e: Column): Column = withAggregateFunction { Did you notice this method ? def count(columnName: String): TypedColumn[Any, Long] = On Wed, Jun 22, 2016 at 9:06 AM, Jakub Dubo

Re: spark-1.6.1-bin-without-hadoop can not use spark-sql

2016-06-22 Thread Ted Yu
hat on param --hadoop, 2.7.2 or others? > > 来自我的华为手机 > > > 原始邮件 > 主题:Re: spark-1.6.1-bin-without-hadoop can not use spark-sql > 发件人:Ted Yu > 收件人:喜之郎 <251922...@qq.com> > 抄送:user > > > I wonder if the tar ball was built with: > > -Phive -Phi

Re: spark-1.6.1-bin-without-hadoop can not use spark-sql

2016-06-22 Thread Ted Yu
I wonder if the tar ball was built with: -Phive -Phive-thriftserver Maybe rebuild by yourself with the above ? FYI On Wed, Jun 22, 2016 at 4:38 AM, 喜之郎 <251922...@qq.com> wrote: > Hi all. > I download spark-1.6.1-bin-without-hadoop.tgz >

Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-22 Thread Ted Yu
For the run which returned incorrect result, did you observe any error (on workers) ? Cheers On Tue, Jun 21, 2016 at 10:42 PM, Nirav Patel wrote: > I have an RDD[String, MyObj] which is a result of Join + Map operation. It > has no partitioner info. I run reduceByKey without passing any Partiti

Re: scala.NotImplementedError: put() should not be called on an EmptyStateMap while doing stateful computation on spark streaming

2016-06-21 Thread Ted Yu
Are you using 1.6.1 ? If not, does the problem persist when you use 1.6.1 ? Thanks > On Jun 20, 2016, at 11:16 PM, umanga wrote: > > I am getting following warning while running stateful computation. The state > consists of BloomFilter (stream-lib) as Value and Integer as key. > > The program

Re: Build Spark 2.0 succeeded but could not run it on YARN

2016-06-20 Thread Ted Yu
What operations did you run in the Spark shell ? It would be easier for other people to reproduce using your code snippet. Thanks On Mon, Jun 20, 2016 at 6:20 PM, Jeff Zhang wrote: > Could you check the yarn app logs for details ? run command "yarn logs > -applicationId " to get the yarn log >

Re: Accessing system environment on Spark Worker

2016-06-19 Thread Ted Yu
Have you looked at http://spark.apache.org/docs/latest/ec2-scripts.html ? There is description on setting AWS_SECRET_ACCESS_KEY. On Sun, Jun 19, 2016 at 4:46 AM, Mohamed Taher AlRefaie wrote: > Hello all: > > I have an application that requires accessing DynamoDB tables. Each worker > establish

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Ted Yu
You can utilize a counter in external storage (NoSQL e.g.) When the counter reaches 2, stop throwing exception so that the task passes. FYI On Sun, Jun 19, 2016 at 3:22 AM, Jacek Laskowski wrote: > Hi, > > Thanks Burak for the idea, but it *only* fails the tasks that > eventually fail the entir

Re: Switching broadcast mechanism from torrrent

2016-06-19 Thread Ted Yu
st$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120) >>>> at >>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:120) >>>> at scala.coll

Re: Dataset Select Function after Aggregate Error

2016-06-18 Thread Ted Yu
scala> ds.groupBy($"_1").count.select(expr("_1").as[String], expr("count").as[Long]) res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: int, count: bigint] scala> ds.groupBy($"_1").count.select(expr("_1").as[String], expr("count").as[Long]).show +---+-+ | _1|count| +---+-+ | 1|

Re: spark-xml - xml parsing when rows only have attributes

2016-06-17 Thread Ted Yu
Please see https://github.com/databricks/spark-xml/issues/92 On Fri, Jun 17, 2016 at 5:19 AM, VG wrote: > I am using spark-xml for loading data and creating a data frame. > > If xml element has sub elements and values, then it works fine. Example > if the xml element is like > > > test >

Re: Spark jobs without a login

2016-06-16 Thread Ted Yu
Can you describe more about the container ? Please show complete stack trace for the exception. Thanks On Thu, Jun 16, 2016 at 1:32 PM, jay vyas wrote: > Hi spark: > > Is it possible to avoid reliance on a login user when running a spark job? > > I'm running out a container that doesnt supply

Re: Kerberos setup in Apache spark connecting to remote HDFS/Yarn

2016-06-16 Thread Ted Yu
bq. Caused by: KrbException: Cannot locate default realm Can you show the rest of the stack trace ? What versions of Spark / Hadoop are you using ? Which version of Java are you using (local and in cluster) ? Thanks On Thu, Jun 16, 2016 at 6:32 AM, akhandeshi wrote: > I am trying to setup my

Re: Reporting warnings from workers

2016-06-15 Thread Ted Yu
Have you looked at: https://spark.apache.org/docs/latest/programming-guide.html#accumulators On Wed, Jun 15, 2016 at 1:24 PM, Mathieu Longtin wrote: > Is there a way to report warnings from the workers back to the driver > process? > > Let's say I have an RDD and do this: > > newrdd = rdd.map(s

Re: Spark 2.0 release date

2016-06-15 Thread Ted Yu
Andy: You should sense the tone in Mich's response. To my knowledge, there hasn't been an RC for the 2.0 release yet. Once we have an RC, it goes through the normal voting process. FYI On Wed, Jun 15, 2016 at 7:38 AM, andy petrella wrote: > > tomorrow lunch time > Which TZ :-) → I'm working on

Re: hivecontext error

2016-06-14 Thread Ted Yu
Which release of Spark are you using ? Can you show the full error trace ? Thanks On Tue, Jun 14, 2016 at 6:33 PM, Tejaswini Buche < tejaswini.buche0...@gmail.com> wrote: > I am trying to use hivecontext in spark. The following statements are > running fine : > > from pyspark.sql import HiveCon

Re: MAtcheERROR : STRINGTYPE

2016-06-14 Thread Ted Yu
Can you give a bit more detail ? version of Spark complete error trace code snippet which reproduces the error On Tue, Jun 14, 2016 at 9:54 AM, pseudo oduesp wrote: > hello > > why i get this error > > when using > > assembleur = VectorAssembler( inputCols=l_CDMVT, > outputCol="aev"+"CODEM

Re: Spark Streaming application failing with Kerboros issue while writing data to HBase

2016-06-13 Thread Ted Yu
Can you show snippet of your code, please ? Please refer to obtainTokenForHBase() in yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala Cheers On Mon, Jun 13, 2016 at 4:44 AM, Kamesh wrote: > Hi All, > We are building a spark streaming application and that application

Re: Basic question. Access MongoDB data in Spark.

2016-06-13 Thread Ted Yu
Have you considered posting the question on stratio's mailing list ? You may get faster response there. On Mon, Jun 13, 2016 at 8:09 AM, Umair Janjua wrote: > Hi guys, > > I have this super basic problem which I cannot figure out. Can somebody > give me a hint. > > http://stackoverflow.com/que

Re: Spark Getting data from MongoDB in JAVA

2016-06-12 Thread Ted Yu
What's the value of spark.version ? Do you know which version of Spark mongodb connector 0.10.3 was built against ? You can use the following command to find out: mvn dependency:tree Maybe the Spark version you use is different from what mongodb connector was built against. On Fri, Jun 10, 2016

Re: Book for Machine Learning (MLIB and other libraries on Spark)

2016-06-11 Thread Ted Yu
Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > >

Re: Book for Machine Learning (MLIB and other libraries on Spark)

2016-06-11 Thread Ted Yu
https://www.amazon.com/Machine-Learning-Spark-Powerful-Algorithms/dp/1783288515/ref=sr_1_1?ie=UTF8&qid=1465657706&sr=8-1&keywords=spark+mllib https://www.amazon.com/Spark-Practical-Machine-Learning-Chinese/dp/7302420424/ref=sr_1_3?ie=UTF8&qid=1465657706&sr=8-3&keywords=spark+mllib https://www.ama

Re: OutOfMemory when doing joins in spark 2.0 while same code runs fine in spark 1.5.2

2016-06-09 Thread Ted Yu
bq. Read data from hbase using custom DefaultSource (implemented using TableScan) Did you use the DefaultSource from hbase-spark module in hbase master branch ? If you wrote your own, mind sharing related code ? Thanks On Thu, Jun 9, 2016 at 2:53 AM, raaggarw wrote: > Hi, > > I was trying to p

Re: Write Ahead Log

2016-06-08 Thread Ted Yu
There was a minor typo in the name of the config: spark.streaming.receiver.writeAheadLog.enable Yes, it only applies to Streaming. On Wed, Jun 8, 2016 at 3:14 PM, Mohit Anchlia wrote: > Is something similar to park.streaming.receiver.writeAheadLog.enable > available on SparkContext? It looks l

Re: comparaing row in pyspark data frame

2016-06-08 Thread Ted Yu
Do you mean returning col3 and 0.4 for the example row below ? > On Jun 8, 2016, at 5:05 AM, pseudo oduesp wrote: > > Hi, > how we can compare multiples columns in datframe i mean > > if df it s dataframe like that : > >df.col1 | df.col2 | df.col3 >

Re: Apache design patterns

2016-06-07 Thread Ted Yu
I think this is the correct forum. Please describe your case. > On Jun 7, 2016, at 8:33 PM, Francois Le Roux wrote: > > HI folks, I have been working through the available online Apache spark > tutorials and I am stuck with a scenario that i would like to solve in SPARK. > Is this a forum

  1   2   3   4   5   6   7   8   9   10   >