bq. getServletHandlers is not intended for public use
>From MetricsSystem.scala :
private[spark] class MetricsSystem private (
Looks like there is no easy way to extend REST API.
On Thu, Mar 24, 2016 at 1:09 PM, Sebastian Kochman <
sebastian.koch...@outlook.com> wrote:
> Hello,
> I have a ques
checkpointing instead of saving still wouldn't
> execute any action on the RDD -- it would just mark the point at which
> checkpointing should be done when an action is eventually run.
>
> On Wed, Mar 23, 2016 at 7:38 PM, Ted Yu wrote:
>
>> bq. when I get the last RDD
>
Here is the doc for defaultParallelism :
/** Default level of parallelism to use when not given by user (e.g.
parallelize and makeRDD). */
def defaultParallelism: Int = {
What if the user changes parallelism ?
Cheers
On Fri, Mar 25, 2016 at 5:33 AM, manasdebashiskar
wrote:
> There is a sc
Looks like you forgot an import for Date.
FYI
On Fri, Mar 25, 2016 at 7:36 AM, Mich Talebzadeh
wrote:
>
>
> Hi,
>
> writing a UDF to convert a string into Date
>
> def ChangeDate(word : String) : Date = {
> | return
> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(word),"dd/MM/"),"-MM-dd")
Do you mind showing body of TO_DATE() ?
Thanks
On Fri, Mar 25, 2016 at 7:38 AM, Ted Yu wrote:
> Looks like you forgot an import for Date.
>
> FYI
>
> On Fri, Mar 25, 2016 at 7:36 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>
>>
>>
This is the original subject of the JIRA:
Partition discovery fail if there is a _SUCCESS file in the table's root dir
If I remember correctly, there were discussions on how (traditional)
partition discovery slowed down Spark jobs.
Cheers
On Fri, Mar 25, 2016 at 10:15 AM, suresk wrote:
> In pr
d6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 25 March 2016 at 14:54, Ted Yu wrote:
>
>> Do you mind showing body of TO_DATE() ?
>>
&g
See this thread:
http://search-hadoop.com/m/q3RTtAvwgE7dEI02
On Fri, Mar 25, 2016 at 10:39 AM, prateek arora
wrote:
> Hi
>
> I want to submit spark application from outside of spark clusters . so
> please help me to provide a information regarding this.
>
> Regards
> Prateek
>
>
>
>
> --
> Vi
> I have one more question .. if i want to launch a spark application in
> production environment so is there any other way so multiple users can
> submit there job without having hadoop configuration .
>
> Regards
> Prateek
>
>
> On Fri, Mar 25, 2016 at 10:50 AM, Ted Yu
Which release of Spark do you use, Mich ?
In master branch, the message is more accurate
(sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala):
override def getMessage: String = s"Table $table not found in database
$db"
On Fri, Mar 25, 2016 at 3:21 PM,
able
> info OK
>
> HTH
>
>
> On Friday, 25 March 2016, 22:32, Ted Yu wrote:
>
>
> Which release of Spark do you use, Mich ?
>
> In master branch, the message is more accurate
> (sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException
/www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 25 March 2016 at 22:40, Ted Yu wrote:
>
>> Looks like database support was fixed by:
>>
>> [SPARK-7943] [SPARK-8105] [SPAR
Session management has improved in 1.6.x (see SPARK-10810)
Mind giving 1.6.1 a try ?
Thanks
On Fri, Mar 25, 2016 at 3:48 PM, Mich Talebzadeh
wrote:
> I have noticed that the only sure way to specify a Hive table from Spark
> is to prefix it with database (DBName) name otherwise it seems to be
Same with master branch.
I found derby.log in the following two files:
.gitignore:derby.log
dev/.rat-excludes:derby.log
FYI
On Sat, Mar 26, 2016 at 4:09 AM, Mich Talebzadeh
wrote:
> Having moved to Spark 1.6.1, I have noticed thar whenerver I start a
> spark-sql or shell. a dervy.log file is
park%20and%20Scala.pdf
>
>
> -- Forwarded message --
> From: Ted Yu
> Date: 26 March 2016 at 12:51
> Subject: Re: Any plans to migrate Transformer API to Spark SQL (closer to
> DataFrames)?
> To: Michał Zieliński
>
>
> Michal:
> Can you share the sli
Please take a look at the following method:
/**
* Get the preferred locations of a partition, taking into account
whether the
* RDD is checkpointed.
*/
final def preferredLocations(split: Partition): Seq[String] = {
checkpointRDD.map(_.getPreferredLocations(split)).getOrElse {
According to:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/bk_HDP_RelNotes-20151221.pdf
Spark 1.5.2 comes out of box.
Suggest moving questions on HDP to Hortonworks forum.
Cheers
On Sat, Mar 26, 2016 at 3:32 PM, Mich Talebzadeh
wrote:
> Thanks Jorn.
>
> Just to be
Please take a look at the MyRDD class in:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
There is scaladoc for the class. See how getPreferredLocations() is
implemented.
Cheers
On Sun, Mar 27, 2016 at 2:01 AM, chenyong wrote:
> Thank you Ted for your reply.
>
> Your ex
Can you show the full stack trace (or top 10 lines) and the snippet using
your MyRDD ?
Thanks
On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He wrote:
> Hi everyone,
>
> I am creating a custom RDD which extends RDD and add a custom method,
> however the custom method cannot be found.
> The
ror: value customMethod is not a member of
> org.apache.spark.rdd.RDD[(Int, String)]*
>
> and the customable method in PairRDDFunctions.scala is
>
> def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
> new MyRDD[K, V](self, partitioner)
> }
>
>
d.RDD[(Int, String)] = MyRDD[3]* at
> customable at
> 5 :28
> 6 scala> *myrdd.customMethod(bulk)*
> *7 error: value customMethod is not a member of
> org.apache.spark.rdd.RDD[(Int, String)]*
>
> On Mon, Mar 28, 2016 at 12:50 AM, Ted Yu wrote:
>
>> bq. def cus
oject then the custom method can be called in the main function and it
>> works.
>> I misunderstand the usage of custom rdd, the custom rdd does not have to be
>> written to the spark project like UnionRDD, CogroupedRDD, and just add it to
>> your own project.
>>
Can you describe your use case a bit more ?
Since the row keys are not sorted in your example, there is a chance that
you get indeterministic results when you aggregate on groups of two
successive rows.
Thanks
On Mon, Mar 28, 2016 at 9:21 AM, sujeet jog wrote:
> Hi,
>
> I have a RDD like this
Dropping dev@
Can you provide a bit more information ?
release of hbase
release of hadoop
I assume you're running on Linux.
Any change in Linux setup before the exception showed up ?
On Mon, Mar 28, 2016 at 10:30 AM, beeshma r wrote:
> Hi
> i am testing with newly build Hbase .Initially tab
Can you describe what gets triggered by triggerAndWait ?
Cheers
On Mon, Mar 28, 2016 at 1:39 PM, kpeng1 wrote:
> Hi All,
>
> I am currently trying to debug a spark application written in scala. I
> have
> a main method:
> def main(args: Array[String]) {
> ...
> SocialUtil.trigge
See this method:
lazy val rdd: RDD[T] = {
On Mon, Mar 28, 2016 at 6:30 PM, Russell Jurney
wrote:
> Ok, I'm also unable to save to Elasticsearch using a dataframe's RDD. This
> seems related to DataFrames. Is there a way to convert a DataFrame's RDD to
> a 'normal' RDD?
>
>
> On Mon, Mar 28, 2
Can you disclose snippet of your code ?
Which Spark release do you use ?
Thanks
> On Mar 29, 2016, at 3:42 AM, Charan Adabala wrote:
>
> From the below image how can we reduce the computing time for the stages, at
> some stages the Executor Computing Time is less than 1 sec and some are
> cons
As the error said, com.sap.db.jdbc.topology.Host is not serializable.
Maybe post question on Sap Hana mailing list (if any) ?
On Tue, Mar 29, 2016 at 7:54 AM, reena upadhyay <
reena.upadh...@impetus.co.in> wrote:
> I am trying to execute query using spark sql on SAP HANA from spark
> shell. I
>
-c CORES, --cores CORES Total CPU cores to allow Spark applications to use
on the machine (default: all available); only on worker
bq. sc.getConf().set()
I think you should use this pattern (shown in
https://spark.apache.org/docs/latest/spark-standalone.html):
val conf = new SparkConf()
Have you tried the following construct ?
new OrderedRDDFunctions[K, V, (K, V)](rdd).sortByKey()
See core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala
On Wed, Mar 30, 2016 at 5:20 AM, Nirav Patel wrote:
> Hi, I am trying to use filterByRange feature of spark OrderedRDDFunctions
>
How did you specify the packages ?
See the following from
https://spark.apache.org/docs/latest/submitting-applications.html :
Users may also include any other dependencies by supplying a
comma-delimited list of maven coordinates with --packages.
On Wed, Mar 30, 2016 at 7:15 AM, Mustafa Elbehery
Looking through
https://spark.apache.org/docs/latest/configuration.html#spark-streaming , I
don't see config specific to YARN.
Can you pastebin the exception you saw ?
When the job stopped, was there any error ?
Thanks
On Wed, Mar 30, 2016 at 10:57 PM, Soni spark
wrote:
> Hi All,
>
> I am una
I tried this:
scala> final case class Text(id: Int, text: String)
warning: there was one unchecked warning; re-run with -unchecked for details
defined class Text
scala> val ds = Seq(Text(0, "hello"), Text(1, "world")).toDF.as[Text]
ds: org.apache.spark.sql.Dataset[Text] = [id: int, text: string]
Spark 1.6.1 uses this version of jackson:
2.4.4
Looks like Tranquility uses different version of jackson.
How do you build your jar ?
Consider using maven-shade-plugin to resolve the conflict if you use maven.
Cheers
On Thu, Mar 31, 2016 at 9:50 AM, Marcelo Oikawa wrote:
> Hi, list.
>
>
Please exclude jackson-databind - that was where the AnnotationMap class
comes from.
On Thu, Mar 31, 2016 at 11:37 AM, Marcelo Oikawa <
marcelo.oik...@webradar.com> wrote:
> Hi, Alonso.
>
> As you can see jackson-core is provided by several libraries, try to
>> exclude it from spark-core, i think
Can you show the stack trace ?
The log message came from
DiskBlockObjectWriter#revertPartialWritesAndClose().
Unfortunately, the method doesn't throw exception, making it a bit hard for
caller to know of the disk full condition.
On Thu, Mar 31, 2016 at 11:32 AM, Abhishek Anand
wrote:
>
> Hi,
>
Looks like this is result of the following check:
val shouldReplace = output.exists(f => resolver(f.name, colName))
if (shouldReplace) {
where existing column, text, was replaced.
On Thu, Mar 31, 2016 at 12:08 PM, Jacek Laskowski wrote:
> Hi,
>
> Just ran into the following. Is this a
In general, you should implement thread-safety in your code.
Which set of events are you interested in ?
Cheers
On Fri, Apr 1, 2016 at 9:23 AM, Truong Duc Kien
wrote:
> Hi,
>
> I need to gather some metrics using a SparkListener. Does the callback
> methods need to thread-safe or they are alwa
You can set them in spark-defaults.conf
See also https://spark.apache.org/docs/latest/configuration.html#spark-ui
On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt wrote:
> Can somebody tell me the interaction between the properties:
>
> spark.ui.retainedJobs
> spark.ui.retainedStages
> spark.history
bq. This was a big help!
The email (maybe only addressed to you) didn't come with your latest reply.
Do you mind sharing it ?
Thanks
On Fri, Apr 1, 2016 at 11:37 AM, ludflu wrote:
> This was a big help! For the benefit of my fellow travelers running spark
> on
> EMR:
>
> I made a json file wi
hem for the history-server? The daemon? The workers?
>
> And what if I use the java API instead of spark-submit for the jobs?
>
> I guess that the spark-defaults.conf are obsolete for the java API?
>
>
> Am 2016-04-01 18:58, schrieb Ted Yu:
>
>> You can set them in
Assuming your code is written in Scala, I would suggest using ScalaTest.
Please take a look at the XXSuite.scala files under mllib/
On Fri, Apr 1, 2016 at 1:31 PM, Shishir Anshuman
wrote:
> Hello,
>
> I have a code written in scala using Mllib. I want to perform unit testing
> it. I cant decide
quot;1.6.0", "org.apache.spark" % "spark-mllib_2.10" % "1.6.0" )*
>
>
>
>
> On Sat, Apr 2, 2016 at 2:21 AM, Ted Yu wrote:
>
>> Assuming your code is written in Scala, I would suggest using ScalaTest.
>>
>> Please take a look at t
Thanks for sharing the workaround.
Probably send a PR on tranquilizer github :-)
On Fri, Apr 1, 2016 at 12:50 PM, Marcelo Oikawa wrote:
> Hi, list.
>
> Just to close the thread. Unfortunately, I didnt solve the jackson lib
> problem but I did a workaround that works fine for me. Perhaps this he
; When I added *"org.apache.spark" % "spark-core_2.10" % "1.6.0", *it
> should include spark-core_2.10-1.6.1-tests.jar.
> Why do I need to use the jar file explicitly?
>
> And how do I use the jars for compiling with *sbt* and running the tests
> on
Looking at the implementation for lookup in PairRDDFunctions, I think your
understanding is correct.
On Sat, Apr 2, 2016 at 3:16 AM, Nirav Patel wrote:
> I will start by question: Is spark lookup function on pair rdd is a driver
> action. ie result is returned to driver?
>
> I have list of Keys
bq. split"\t," splits the filter by carriage return
Minor correction: "\t" denotes tab character.
On Sun, Apr 3, 2016 at 7:24 AM, Eliran Bivas wrote:
> Hi Mich,
>
> 1. The first underscore in your filter call is refering to a line in the
> file (as textFile() results in a collection of strings)
showlines = messages.filter(_ contains("ASE 15")).filter(_
> contains("UPDATE INDEX STATISTICS")).flatMap(line =>
> line.split("\n,")).map(word => (word, 1)).reduceByKey(_ +
> _).collect.foreach(println)
>
>
> How does one refer to the conten
l v = lines.filter(_.contains("ASE 15")).filter(_
>> contains("UPDATE INDEX STATISTICS")).flatMap(line =>
>> line.split("\n,")).map(word => (word, 1)).reduceByKey(_ +
>> _).collect.foreach(println)
>>
>>
>> Dr Mich Talebzadeh
>
terialize all rows?
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://tal
bq. the modifications do not touch the scheduler
If the changes can be ported over to 1.6.1, do you mind reproducing the
issue there ?
I ask because master branch changes very fast. It would be good to narrow
the scope where the behavior you observed started showing.
On Mon, Apr 4, 2016 at 6:12
bq. I'm on version 2.10 for spark
The above is Scala version.
Can you give us the Spark version ?
Thanks
On Mon, Apr 4, 2016 at 2:36 PM, mpawashe wrote:
> Hi all,
>
> I am using Spark Streaming API (I'm on version 2.10 for spark and
> streaming), and I am running into a function serialization
Did you define idxmax() method yourself ?
Thanks
On Tue, Apr 5, 2016 at 4:17 AM, Angel Angel wrote:
> Hello,
>
> i am writing one spark application i which i need the index of the maximum
> element.
>
> My table has one column only and i want the index of the maximum element.
>
> MAX(count)
> 2
The error was due to REPL expecting an integer (index to the Array) whereas
"MAX(count)" was a String.
What do you want to achieve ?
On Tue, Apr 5, 2016 at 4:17 AM, Angel Angel wrote:
> Hello,
>
> i am writing one spark application i which i need the index of the maximum
> element.
>
> My table
Which hadoop release are you using ?
bq. yarn cluster with 2GB RAM
I assume 2GB is per node. Isn't this too low for your use case ?
Cheers
On Wed, Apr 6, 2016 at 9:19 AM, Peter Rudenko
wrote:
> Hi i have a situation, say i have a yarn cluster with 2GB RAM. I'm
> submitting 2 spark jobs with "
Have you looked at SparkListener ?
/**
* Called when the driver registers a new executor.
*/
def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): Unit
/**
* Called when the driver removes an executor.
*/
def onExecutorRemoved(executorRemoved: SparkListenerExecutorRe
This is the version of Kafka Spark depends on:
[INFO] +- org.apache.kafka:kafka_2.10:jar:0.8.2.1:compile
On Thu, Apr 7, 2016 at 9:14 AM, Haroon Rasheed
wrote:
> Try removing libraryDependencies += "org.apache.kafka" %% "kafka" % "1.6.0"
> compile. I guess the internal dependencies are automatic
Which Spark release are you using ?
Have you registered to all the events provided by SparkListener ?
If so, can you do event-wise summation of execution time ?
Thanks
On Thu, Apr 7, 2016 at 11:03 AM, JasmineGeorge wrote:
> We are running a batch job with the following specifications
> •
Looks like you're using Spark 1.6.x
What error(s) did you get for the first two joins ?
Thanks
On Fri, Apr 8, 2016 at 3:53 AM, JH P wrote:
> Hi. I want a dataset join with itself. So i tried below codes.
>
> 1. newGnsDS.joinWith(newGnsDS, $"dataType”)
>
> 2. newGnsDS.as("a").joinWith(newGnsDS.
I searched 1.6.1 code base but didn't find how this can be configured
(within Spark).
On Fri, Apr 8, 2016 at 9:01 AM, nihed mbarek wrote:
> Hi
> How to configure parquet.block.size on Spark 1.6 ?
>
> Thank you
> Nihed MBAREK
>
>
> --
>
> M'BAREK Med Nihed,
> Fedora Ambassador, TUNISIA, Northern
Did you encounter similar error on a smaller dataset ?
Which release of Spark are you using ?
Is it possible you have an incompatible snappy version somewhere in your
classpath ?
Thanks
On Fri, Apr 8, 2016 at 12:36 PM, entee wrote:
> I'm trying to do a relatively large join (0.5TB shuffle rea
gt; pyspark.sql on a Spark DataFrame.
>
> Any ideas?
>
> Nicolas
>
> On Fri, Apr 8, 2016 at 1:13 PM, Ted Yu wrote:
>
>> Did you encounter similar error on a smaller dataset ?
>>
>> Which release of Spark are you using ?
>>
>> Is it possible
mahesh :
bq. :16: error: not found: value sqlContext
Please take a look at:
https://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext
for how the import should be used.
Please include version of Spark and the commandline you used in the reply.
The value was out of the range of integer.
Which Spark release are you using ?
Can you post snippet of code which can reproduce the error ?
Thanks
On Sat, Apr 9, 2016 at 12:25 PM, SURAJ SHETH wrote:
> I am trying to perform some processing and cache and count the RDD.
> Any solutions?
>
> See
Looks like the exception occurred on driver.
Consider increasing the values for the following config:
conf.set("spark.driver.memory", "10240m")
conf.set("spark.driver.maxResultSize", "2g")
Cheers
On Sat, Apr 9, 2016 at 9:02 PM, Buntu Dev wrote:
> I'm running it via pyspark against yarn in cli
Haven't found any JIRA w.r.t. combineByKey for Dataset.
What's your use case ?
Thanks
On Sat, Apr 9, 2016 at 7:38 PM, Amit Sela wrote:
> Is there (planned ?) a combineByKey support for Dataset ?
> Is / Will there be a support for combiner lifting ?
>
> Thanks,
> Amit
>
nd Events.
>
> I can do the event wise summation for couple of runs and get back to you.
>
>
>
> Thanks,
>
> Jasmine
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* Thursday, April 07, 2016 1:43 PM
> *To:* JasmineGeorge
> *Cc:* user
> *
mbda x : x.rsplit('\t',1)).map(lambda x :
> [x[0],getRows(x[1])]).cache()\
> .groupBy(lambda x : x[0].split('\t')[1]).mapValues(lambda x :
> list(x)).cache()
>
> text1.count()
>
> Thanks and Regards,
> Suraj Sheth
>
> On Sun, Apr 10, 2016 at
For SparkR, please refer to https://spark.apache.org/docs/latest/sparkr.html
bq. on Ubuntu or CentOS
Both platforms are supported.
On Mon, Apr 11, 2016 at 1:08 PM, wrote:
> Dear Experts ,
>
> I am posting this for your information. I am a newbie to spark.
> I am interested in understanding Spa
Please take a look
at
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
Cheers
On Mon, Apr 11, 2016 at 12:13 PM, Radhakrishnan Iyer <
radhakrishnan.i...@citiustech.com> wrote:
> Hi all,
>
>
>
> I am new to Spark.
>
> I have a json in below format :
>
> Empl
See
https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
On Mon, Apr 11, 2016 at 3:15 PM, Jialin Liu wrote:
> Hi Spark users/experts,
>
> I’m wondering how does the Spark scheduler work?
> What kind of resources will be considered during the scheduling, does
gen-idea doesn't seem to be a valid command:
[warn] Ignoring load failure: no project loaded.
[error] Not a valid command: gen-idea
[error] gen-idea
On Tue, Apr 12, 2016 at 8:28 AM, ImMr.K <875061...@qq.com> wrote:
> Hi,
> I have cloned spark and ,
> cd spark
> build/sbt gen-idea
>
> got the fol
See
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
On Tue, Apr 12, 2016 at 8:52 AM, ImMr.K <875061...@qq.com> wrote:
> But how to import spark repo into idea or eclipse?
>
>
>
> -- 原始邮件 ---------
bq. Most recent failure cause:
Can you paste the remaining cause ?
Which Spark release are you using ?
Thanks
On Tue, Apr 12, 2016 at 1:10 PM, AlexModestov
wrote:
> I get an error while I form a dataframe from the parquet file:
>
> Py4JJavaError: An error occurred while calling
> z:org.apache
You can find various examples involving Serializable Java POJO
e.g.
./examples/src/main/java/org/apache/spark/examples/ml/JavaALSExample.java
Please pastebin some details on 'Task not serializable error'
Thanks
On Tue, Apr 12, 2016 at 12:44 PM, Daniel Valdivia
wrote:
> Hi,
>
> I'm moving some
FYI
https://documentation.cpanel.net/display/CKB/How+To+Clear+Your+DNS+Cache#HowToClearYourDNSCache-MacOS
®10.10
https://www.whatsmydns.net/flush-dns.html#linux
On Tue, Apr 12, 2016 at 2:44 PM, Bibudh Lahiri
wrote:
> Hi,
>
> I am trying to run a piece of code with logistic regression on
> P
bq. --conf "spark.executor.extraJavaOptions=-Dlog4j.
configuration=env/dev/log4j-driver.properties"
I think the above may have a typo : you refer to log4j-driver.properties in
both arguments.
FYI
On Wed, Apr 13, 2016 at 8:09 AM, Carlos Rojas Matas
wrote:
> Hi guys,
>
> I'm trying to enable log
w.r.t. the effective storage level log, here is the JIRA which introduced
it:
[SPARK-4671][Streaming]Do not replicate streaming block when WAL is enabled
On Wed, Apr 13, 2016 at 7:43 AM, Patrick McGloin
wrote:
> Hi all,
>
> If I am using a Custom Receiver with Storage Level set to StorageLevel.
Can you pastebin the failure message ?
Did you happen to take jstack during the close ?
Which Hadoop version do you use ?
Thanks
> On Apr 14, 2016, at 5:53 AM, nihed mbarek wrote:
>
> Hi,
> I have an issue with closing my application context, the process take a long
> time with a fail at t
bq. localtest.txt#appSees.txt
Which file did you want to pass ?
Thanks
On Thu, Apr 14, 2016 at 2:14 PM, Benjamin Zaitlen
wrote:
> Hi All,
>
> I'm trying to use the --files option with yarn:
>
> spark-submit --master yarn-cluster /home/ubuntu/test_spark.py --files
>> /home/ubuntu/localtest.txt#
For Parquet, please take a look at SPARK-1251
For ORC, not sure.
Looking at git history, I found ORC mentioned by SPARK-1368
FYI
On Thu, Apr 14, 2016 at 6:53 PM, Edmon Begoli wrote:
> I am needing this fact for the research paper I am writing right now.
>
> When did Spark start supporting Parq
You can call stop() method.
> On Apr 15, 2016, at 5:21 AM, ram kumar wrote:
>
> Hi,
> I started hivecontext as,
>
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
>
> I want to stop this sql context
>
> Thanks
See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1
On Fri, Apr 15, 2016 at 5:38 AM, Carlos Rojas Matas
wrote:
> Hi guys,
>
> any clue on this? Clearly the
> spark.executor.extraJavaOpts=-Dlog4j.configuration is not working on the
> executors.
>
> Thanks,
> -carlos.
>
> On Wed, Apr 13,
Please send query to user@hbase
This is the default value:
zookeeper.znode.parent
/hbase
Looks like hbase-site.xml accessible on your client didn't have up-to-date
value for zookeeper.znode.parent
Please make sure hbase-site.xml with proper config is on the classpath.
On Sat, Apr 16, 20
Looks like this question is more relevant on flink mailing list :-)
On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh
wrote:
> Hi,
>
> Has anyone used Apache Flink instead of Spark by any chance
>
> I am interested in its set of libraries for Complex Event Processing.
>
> Frankly I don't know if
Kevin:
Can you describe how you got over the Metadata fetch exception ?
> On Apr 16, 2016, at 9:41 AM, Kevin Eid wrote:
>
> One last email to announce that I've fixed all of the issues. Don't hesitate
> to contact me if you encounter the same. I'd be happy to help.
>
> Regards,
> Kevin
>
>> O
>From the output you posted:
---
Unpacking Spark
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
---
The artifact for spark-1.6.1-bin-hadoop2.6 is corrupt.
This problem has been reported in other threads.
Try spark-1.6.1-bin-hadoop
Apr 16, 2016 at 2:14 PM, Ted Yu wrote:
> From the output you posted:
> ---
> Unpacking Spark
>
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
> ---
>
> The artifact for spark-1.6.1-bin-hadoop2.6 i
to
> https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.7.tgz
> and I get a NoSuchKey error.
>
> Should I just go with it even though it says hadoop2.6?
>
> On Sat, Apr 16, 2016 at 5:37 PM, Ted Yu wrote:
>
>> BTW this was the original thread:
&g
bucket, so hopefully everything should be
> working now. Let me know if you still encounter any problems with
> unarchiving.
>
> On Sat, Apr 16, 2016 at 3:10 PM Ted Yu wrote:
>
>> Pardon me - there is no tarball for hadoop 2.7
>>
>> I downloaded
>> https://s
tely not working, at least for logging configuration.
>
> Thanks,
> -carlos.
>
> On Fri, Apr 15, 2016 at 3:28 PM, Ted Yu wrote:
>
>> See this thread: http://search-hadoop.com/m/q3RTtsFrd61q291j1
>>
>> On Fri, Apr 15, 2016 at 5:38 AM, Carlos Ro
The CatalogTracker object may not be used by all the methods of HBaseAdmin.
Meaning, when HBaseAdmin is constructed, we don't need CatalogTracker.
On Tue, Apr 19, 2016 at 6:09 AM, WangYQ wrote:
> in hbase 0.98.10, class "HBaseAdmin "
> line 303, method "tableExists", will create a catal
Using
http://www.ruddwire.com/handy-code/date-to-millisecond-calculators/#.VxZh3iMrKuo
, 1460823008000 is shown to be 'Sat Apr 16 2016 09:10:08 GMT-0700'
Can you clarify the 4 day difference ?
bq. 'right now April 14th'
The date of your email was Apr 16th.
On Sat, Apr 16, 2016 at 9:39 AM, Hemal
Can you tell us the memory parameters you used ?
If you can capture jmap before the GC limit was exceeded, that would give us
more clue.
Thanks
> On Apr 19, 2016, at 7:40 PM, "kramer2...@126.com" wrote:
>
> Hi All
>
> I use spark doing some calculation.
> The situation is
> 1. New file wi
Do you mind trying out build from master branch ?
1.5.3 is a bit old.
On Wed, Apr 20, 2016 at 5:25 AM, FangFang Chen
wrote:
> I found spark sql lost precision, and handle data as int with some rule.
> Following is data got via hive shell and spark sql, with same sql to same
> hive table:
> Hive
Please take a look at:
https://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes
On Wed, Apr 20, 2016 at 9:50 AM, Ashok Kumar
wrote:
> Hi,
>
> I have Spark 1.6.1 but I do bot know how to invoke SparkR so I can use R
> with Spark.
>
> Is there a s hell similar to spark-shell that support
FileStatusCache used to be inside interfaces.scala
But in master branch, I no longer see it there.
Looks like refactor has removed the class.
On Wed, Apr 20, 2016 at 11:19 AM, Ditesh Kumar
wrote:
> Hi,
>
> When creating a DataFrame from a partitioned file structure (
> sqlContext.read.parquet("
The weight field is not nullable.
Looks like your source table had null value for this field.
On Wed, Apr 20, 2016 at 4:11 PM, Charles Nnamdi Akalugwu <
cprenzb...@gmail.com> wrote:
> Hi,
>
> I am using spark 1.4.1 and trying to copy all rows from a table in one
> MySQL Database to a Amazon RDS
>
> Can't translate null value for field
> StructField(density,DecimalType(4,2),true)
> On Apr 21, 2016 1:37 AM, "Ted Yu" wrote:
>
>> The weight field is not nullable.
>>
>> Looks like your source table had null value for this field.
>>
>>
In upcoming 2.0 release, the signature for map() has become:
def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan {
Note: DataFrame and DataSet are unified in 2.0
FYI
On Thu, Apr 21, 2016 at 6:49 AM, Apurva Nandan wrote:
> Hello everyone,
>
> Generally speaking, I guess it's well
201 - 300 of 1700 matches
Mail list logo