n Wed, Mar 9, 2016 at 9:24 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Hadoop glob pattern doesn't support multi level wildcard.
>>
>> Thanks
>>
>> On Mar 9, 2016, at 6:15 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> if its based on
bq. it is kind of columnar NoSQL database.
The storage format in HBase is not columnar.
I would suggest you build upon what you already know (Spark and Hive) and
expand on that. Also, if your work uses Big Data technologies, those would
be the first to consider getting to know better.
On Wed,
Hadoop glob pattern doesn't support multi level wildcard.
Thanks
> On Mar 9, 2016, at 6:15 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
> if its based on HadoopFsRelation shouldn't it support it? HadoopFsRelation
> handles globs
>
>> On Wed, Mar 9, 2016
This is currently not supported.
> On Mar 9, 2016, at 4:38 AM, Jakub Liska wrote:
>
> Hey,
>
> is something like this possible?
>
> sqlContext.read.json("/mnt/views-p/base/2016/01/*/*-xyz.json")
>
> I switched to DataFrames because my source files changed from TSV to
drop down menu on the right hand side of the Create button (it looks as if
>> it's part of the button) - when I clicked directly on the word "Create" I
>> got a form that made more sense and allowed me to choose the project.
>>
>> Regards,
>>
>>
h Bajaj
>
> On Tue, Mar 8, 2016 at 6:25 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>> Hi Ted
>>
>> I believe by default cassandra listens on 9042
>>
>> From: Ted Yu <yuzhih...@gmail.com>
>> Date: Tuesday, March 8, 2016 a
Have you contacted spark-cassandra-connector related mailing list ?
I wonder where the port 9042 came from.
Cheers
On Tue, Mar 8, 2016 at 6:02 PM, Andy Davidson wrote:
>
> I am using spark-1.6.0-bin-hadoop2.6. I am trying to write a python
> notebook that reads
That may miss the 15th minute of the hour (with non-trivial deviation),
right ?
On Tue, Mar 8, 2016 at 8:50 AM, ayan guha wrote:
> Why not compare current time in every batch and it meets certain condition
> emit the data?
> On 9 Mar 2016 00:19, "Abhishek Anand"
This seems related:
the second paragraph under Implementation and theory
https://en.wikipedia.org/wiki/Closure_(computer_programming)
On Tue, Mar 8, 2016 at 4:49 AM, Minglei Zhang wrote:
> hello, experts.
>
> I am a student. and recently, I read a paper about *Actor
Josh:
SerializerInstance and SerializationStream would also become private[spark],
right ?
Thanks
On Mon, Mar 7, 2016 at 6:57 PM, Josh Rosen wrote:
> Does anyone implement Spark's serializer interface
> (org.apache.spark.serializer.Serializer) in your own third-party
in, Atlas,
> Ranger, Apache Infrastructure. There doesn't seem to be an option for me to
> raise an issue for Spark?!
>
> Regards,
>
> James
>
>
> On 4 March 2016 at 14:03, James Hammerton <ja...@gluru.co> wrote:
>
>> Sure thing, I'll see if I can isolate th
w.r.t. akka, please see the following:
[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming
There're various ways to design distributed system. Can you outline what
your program does ?
Cheers
On Sun, Mar 6, 2016 at 8:35 AM, Minglei Zhang wrote:
> hello, experts
Have you taken a look at SPARK-12739 ?
FYI
On Sun, Mar 6, 2016 at 4:06 AM, Jatin Kumar <
jku...@rocketfuelinc.com.invalid> wrote:
> Hello all,
>
> Consider following two code blocks:
>
> val ssc = new StreamingContext(sparkConfig, Seconds(2))
> val stream = KafkaUtils.createDirectStream(...)
>
ds,
> Gourav Sengupta
>
>> On Sun, Mar 6, 2016 at 11:48 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>> Gourav:
>> For the 3rd paragraph, did you mean the job seemed to be idle for about 5
>> minutes ?
>>
>> Cheers
>>
>>> On Mar 6, 2016, at
Gourav:
For the 3rd paragraph, did you mean the job seemed to be idle for about 5
minutes ?
Cheers
> On Mar 6, 2016, at 3:35 AM, Gourav Sengupta wrote:
>
> Hi,
>
> This is a solved problem, try using s3a instead and everything will be fine.
>
> Besides that you
Looking at the methods you call on HiveContext, they seem to belong
to SQLContext.
For SQLContext, you can use the below method of SQLContext in FirstQuery to
retrieve SQLContext:
def getOrCreate(sparkContext: SparkContext): SQLContext = {
FYI
On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh
bq. I haven't added one more HDFS node to a hadoop cluster
Does each of three nodes colocate with hdfs data nodes ?
The absence of 4th data node might have something to do with the partition
allocation.
Can you show your code snippet ?
Thanks
On Sat, Mar 5, 2016 at 2:54 PM, Eugene Morozov
bq. reportError("Exception while streaming travis", e)
I assume there was none of the above in your job.
What Spark release are you using ?
Thanks
On Sat, Mar 5, 2016 at 4:57 AM, Dominik Safaric
wrote:
> Dear all,
>
> Lately, as a part of a scientific
a member of Seq[(String, Int)]
>> [error] val b = a.toDF("Name","score").registerTempTable("tmp")
>> [error] ^
>> [error]
>> /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:17: not
>> found: value sql
>> [error] sql("select
ain/scala/Sequence.scala:19: value
> toDF is not a member of Seq[(String, Int)]
> [error] a.toDF("Name","score").sort(desc("score")).show
> [error] ^
> [error] three errors found
> [error] (compile:compileIncremental) Compilation failed
> [error
; toDF is not a member of Seq[(String, Int)]
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
bq. However the method does not seem inherited to HiveContext.
Can you clarify the above observation ?
HiveContext extends SQLContext .
On Fri, Mar 4, 2016 at 1:23 PM, jelez wrote:
> What is the best approach to use getOrCreate for streaming job with
> HiveContext.
> It
Can you add the following into your code ?
import sqlContext.implicits._
On Fri, Mar 4, 2016 at 1:14 PM, Mich Talebzadeh
wrote:
> Hi,
>
> I have a simple Scala program as below
>
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkContext._
> import
assLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 11 more
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https
Can you show the complete stack trace ?
It was clear which class whose definition was not found.
On Fri, Mar 4, 2016 at 6:46 AM, Mich Talebzadeh
wrote:
> Hi,
>
> I have a simple Scala code that I want to use it in an sbt project.
>
> It is pretty simple but imports
Have you taken a look at https://parquet.apache.org/community/ ?
On Thu, Mar 3, 2016 at 7:32 PM, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:
> Hi,
>
> I am exploring to use Apache Parquet with Spark SQL in our project. I
> notice that Apache Parquet uses different encoding for
bq. that solved some problems
Is there any problem that was not solved by the tweak ?
Thanks
On Thu, Mar 3, 2016 at 4:11 PM, Eugen Cepoi wrote:
> You can limit the amount of memory spark will use for shuffle even in 1.6.
> You can do that by tweaking the
bq. hConf.setBoolean("hbase.cluster.distributed", true)
Not sure why the above is needed. If hbase-site.xml is on the classpath, it
should contain the above setting already.
FYI
On Thu, Mar 3, 2016 at 6:08 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> From the log
From the log snippet you posted, it was not clear why connection got lost. You
can lower the value for caching and see if GC activity gets lower.
How wide are the rows in hbase table ?
Thanks
> On Mar 3, 2016, at 1:01 AM, Nirav Patel wrote:
>
> so why does
Have you seen the thread 'Filter on a column having multiple values' where
Michael gave this example ?
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/107522969592/2840265927289860/2388bac36e.html
FYI
On Wed, Mar 2, 2016 at
You only showed one record from each table.
Have you looked at the following method in DataFrame ?
def unionAll(other: DataFrame): DataFrame = withPlan {
On Tue, Mar 1, 2016 at 7:13 PM, Angel Angel wrote:
> Hello Sir/Madam,
>
> I am using the spark sql for the data
See this in source repo:
./.idea/projectCodeStyle.xml
On Tue, Mar 1, 2016 at 6:55 PM, zml张明磊 wrote:
> Hello,
>
>
>
> Appreciate if you have xml file with the following style code ?
>
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
>
>
>
>
Using pastebin seems to be better.
The attachment may not go through.
FYI
On Tue, Mar 1, 2016 at 6:07 PM, Jeff Zhang wrote:
> Do you mind to attach the whole yarn app log ?
>
> On Wed, Mar 2, 2016 at 10:03 AM, Nirav Patel
> wrote:
>
>> Hi Ryan,
>>
>>
Do you mind pastebin'ning the stack trace with the error so that we know
which part of the code is under discussion ?
Thanks
On Tue, Mar 1, 2016 at 7:48 AM, Peter Halliday wrote:
> I have a Spark application that has a Task seem to fail, but it actually
> did write out some
RDD serialized by one release of Spark is not guaranteed to be readable by
another release of Spark.
Please check whether there are mixed Spark versions.
FYI:
http://stackoverflow.com/questions/10378855/java-io-invalidclassexception-local-class-incompatible
On Tue, Mar 1, 2016 at 7:35 AM,
component being built with different
release of hbase.
Try setting "hbase.defaults.for.version.skip" to true.
Cheers
On Mon, Feb 29, 2016 at 9:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 16/02/29 23:09:34 INFO ZooKeeper: Initiating client connection,
> connec
inaccessible to your Spark job.
Please add it in your classpath.
On Mon, Feb 29, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 16/02/29 23:09:34 INFO ClientCnxn: Opening socket connection to server
> localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using
>
16/02/29 23:09:34 INFO ClientCnxn: Opening socket connection to server
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL
(unknown error)
Is your cluster secure cluster ?
bq. Trace :
Was there any output after 'Trace :' ?
Was hbase-site.xml accessible to your Spark job
Divya:
Please try not to cross post your question.
In your case HBase-common jar is needed. To find all the hbase jars needed, you
can run 'mvn dependency:tree' and check its output.
> On Feb 29, 2016, at 1:48 AM, Divya Gehlot wrote:
>
> Hi,
> I am trying to access
http://www.amazon.com/Scala-Spark-Alexy-Khrabrov/dp/1491929286/ref=sr_1_1?ie=UTF8=1456696284=8-1=spark+dataframe
There is another one from Wiley (to be published on March 21):
"Spark: Big Data Cluster Computing in Production," written by Ilya Ganelin,
Brennon York, Kai Sasaki, and Ema Orhian
On
ase module only but the problem
> is when I do the bulk load it shows data skew and takes time to create the
> hfile.
> On 26 Feb 2016 10:25 p.m., "Ted Yu" <yuzhih...@gmail.com> wrote:
>
>> In hbase, there is hbase-spark module which supports bulk load.
>>
In hbase, there is hbase-spark module which supports bulk load.
This module is to be backported in the upcoming 1.3.0 release.
There is some pending work, such as HBASE-15271 .
FYI
On Fri, Feb 26, 2016 at 8:50 AM, Renu Yadav wrote:
> Has anybody implemented bulk load into
I think Ningjun was looking for programmatic way of tracking progress.
I took a look at:
./core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala
but there doesn't seem to exist fine grained events directly reflecting
what Ningjun looks for.
On Tue, Feb 23, 2016 at 11:24 AM, Kevin
Please take a look at the following if you can utilize Hive hdf:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUdfSuite.scala
On Tue, Feb 23, 2016 at 6:28 AM, Chandeep Singh wrote:
> This should help -
>
For receiver approach, have you tried Ryan's workaround ?
Btw I don't see the errors you faced because there was no attachment.
> On Feb 23, 2016, at 3:39 AM, vaibhavrtk1 wrote:
>
> Hello
>
> I have tried with Direct API but i am getting this an error, which is
Which line is line 42 in your code ?
When variable lines becomes empty, you can stop your program.
Cheers
> On Feb 23, 2016, at 12:25 AM, Femi Anthony wrote:
>
> I am working on Spark Streaming API and I wish to stream a set of
> pre-downloaded web log files continuously
Which Hadoop release did you build Spark against ?
Can you give the full stack trace ?
> On Feb 22, 2016, at 9:38 PM, Arunkumar Pillai wrote:
>
> Hi When i try to start spark-shell
> I'm getting following error
>
>
> Exception in thread "main"
Mich:
Please refer to the following test suite for examples on various DataFrame
operations:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
On Mon, Feb 22, 2016 at 4:39 PM, Mich Talebzadeh <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:
> Thanks Dean.
>
> I gather if
}
>>> else{
>>> return new Tuple2<>(key, null);
>>> }
>>> }
>>> else{
>>>
Please see
SPARK-1762 Add functionality to pin RDDs in cache
On Mon, Feb 22, 2016 at 6:43 AM, Pietro Gentile <
pietro.gentile89.develo...@gmail.com> wrote:
> Hi all,
>
> Is there a way to prevent eviction of the RDD from SparkContext ?
> I would not use the cache with its default behavior
The referenced benchmark is in Chinese. Please provide English version so
that more people can understand.
For item 7, looks like the speed of ingest is much slower compared to using
Parquet.
Cheers
On Mon, Feb 22, 2016 at 6:12 AM, 开心延年 wrote:
> 1.ya100 is not only the
rflow workflow scheduler:
> https://github.com/fluxcapacitor/pipeline/wiki
>
> my advice with spark streaming is to get the data out of spark streaming
> as quickly as possible - and into a more durable format more suitable for
> aggregation and compute.
>
> this greatly simplif
w.r.t. cleaner TTL, please see:
[SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0
FYI
On Sun, Feb 21, 2016 at 4:16 AM, Gerard Maas wrote:
>
> It sounds like another window operation on top of the 30-min window will
> achieve the desired objective.
> Just
I tried the following in spark-shell:
scala> val df0 = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B",
"C", "num")
df0: org.apache.spark.sql.DataFrame = [A: string, B: string ... 2 more
fields]
scala> val idList = List("1", "2", "3")
idList: List[String] = List(1, 2, 3)
scala> val
Have you looked at:
SPARK-12662 Fix DataFrame.randomSplit to avoid creating overlapping splits
Cheers
On Sat, Feb 20, 2016 at 7:01 PM, tuan3w wrote:
> I'm training a model using MLLib. When I try to split data into training
> and
> test data, I found a weird problem. I
For #2, you can filter out row whose first column has length 0.
Cheers
> On Feb 20, 2016, at 6:59 AM, Mich Talebzadeh wrote:
>
> Thanks
>
>
> So what I did was
>
> scala> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
>
Have you considered using a Key Value store which is accessible to both
jobs ?
The communication would take place through this store.
Cheers
On Fri, Feb 19, 2016 at 11:48 AM, Ashish Soni wrote:
> Hi ,
>
> Is there any way we can communicate across two different spark
on windows.
>
> My default the start-all.sh doesn't work and I don't see anything in
> localhos:8080
>
> I will do some more investigation and come back.
>
> Thanks again for all your help!
>
> Thanks & regards
> Arko
>
>
> On Fri, Feb 19, 2016 at 6:35
Please see https://spark.apache.org/docs/latest/spark-standalone.html
On Fri, Feb 19, 2016 at 6:27 PM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:
> Hi,
>
> Thanks for your response, that really helped.
>
> However, I don't believe the job is being submitted. When I run spark
>
Can you clarify your question ?
Did you mean the body of your class ?
> On Feb 19, 2016, at 4:43 AM, Ashok Kumar wrote:
>
> Hi,
>
> If I define a class in Scala like
>
> case class(col1: String, col2:Int,...)
>
> and it is created how would I be able to see its
Is it possible to perform the tests using Spark 1.6.0 ?
Thanks
On Thu, Feb 18, 2016 at 9:51 PM, Prabhu Joseph
wrote:
> Hi All,
>
>When running concurrent Spark Jobs on YARN (Spark-1.5.2) which share a
> single Spark Context, the jobs take more time to complete
Richard:
Please see SPARK-9664 Use sqlContext.udf to register UDAFs
Cheers
On Thu, Feb 18, 2016 at 3:18 PM, Kabeer Ahmed
wrote:
> I use Spark 1.5 with CDH5.5 distribution and I see that support is present
> for UDAF. From the link:
>
pplin notebook
> if they do some port scanning...
>
> 2016-02-18 15:04 GMT+01:00 Gourav Sengupta <gourav.sengu...@gmail.com>:
>
>> Hi,
>>
>> Just out of sheet curiosity why are you not using EMR to start your SPARK
>> cluster?
>>
>>
>> Regard
bq. streamingContext.remember("duration") did not help
Can you give a bit more detail on the above ?
Did you mean the job encountered OOME later on ?
Which Spark release are you using ?
Cheers
On Wed, Feb 17, 2016 at 6:03 PM, ramach1776 wrote:
> We have a streaming
Have you seen this ?
HADOOP-10988
Cheers
On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton wrote:
> HI,
>
> I am seeing warnings like this in the logs when I run Spark jobs:
>
> OpenJDK 64-Bit Server VM warning: You have loaded library
>
idea as to when this will be released?
>
> Thanks,
> Ben
>
>
> On Feb 17, 2016, at 2:53 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> The HBASE JIRA below is for HBase 2.0
>
> HBase Spark module would be back ported to hbase 1.3.0
>
> FYI
>
> On Feb 17, 2016,
The HBASE JIRA below is for HBase 2.0
HBase Spark module would be back ported to hbase 1.3.0
FYI
> On Feb 17, 2016, at 1:13 PM, Chandeep Singh wrote:
>
> HBase-Spark module was added in 1.3
>
> https://issues.apache.org/jira/browse/HBASE-13992
>
>
If the Accumulators are updated at the same time, calling foreach() once seems
to have better performance.
> On Feb 17, 2016, at 4:30 PM, Daniel Imberman
> wrote:
>
> Hi all,
>
> So I'm currently figuring out how to accumulate three separate accumulators:
>
> val
on){
>obj.g = calculateE(e,f)
> }
> obj
> )
>
>
> So I created 1 class with all variables, and then trying to update fields
> of the same class.
>
> On Tue, Feb 16, 2016 at 11:38 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Age can be computed fr
Darren:
Can you post link to the deadlock issue you mentioned ?
Thanks
> On Feb 16, 2016, at 6:55 AM, Darren Govoni wrote:
>
> I think this is part of the bigger issue of serious deadlock conditions
> occurring in spark many of us have posted on.
>
> Would the task in
to fix the issue
>
> Regards,
> Satish Chandra
>
>
>
>
>
> On Mon, Feb 15, 2016 at 7:41 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> fill() was introduced in 1.3.1
>>
>> Can you show code snippet which reproduces the error ?
>&
Age can be computed from the birthdate.
Looks like it doesn't need to be a member of Animal class.
If age is just for illustration, can you give an example which better
mimics the scenario you work on ?
Cheers
On Mon, Feb 15, 2016 at 8:53 PM, Hemalatha A <
hemalatha.amru...@googlemail.com>
Have you seen this thread ?
http://search-hadoop.com/m/q3RTtW43zT1e2nfb=Re+ibsnappyjava+so+failed+to+map+segment+from+shared+object
On Mon, Feb 15, 2016 at 7:09 PM, Paolo Villaflores
wrote:
>
> Hi,
>
>
>
> I am trying to run spark 1.6.0.
>
> I have previously just
Can you describe the types of query you want to perform ?
If you don't already have a data flow which is optimized for RDD, I would
suggest using Dataframe API (or event DataSet API) which gives optimizer
more room.
Cheers
On Mon, Feb 15, 2016 at 6:43 PM, Divya Gehlot
cala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> thanks
>
>
>
>
> On Tuesday, 16 February 2016, 1:33, Ted Yu <yuzhih...@gmail.com> wrote:
>
>
> Here is the path to the examples jar in 1.6.0 release:
>
> ./lib/spark-examples-1.6.0-hado
If you don't modify HdfsTest.scala, there is no need to rebuild it - it is
contained in the examples jar coming with Spark release.
You can use spark-submit to run the example.
Cheers
On Mon, Feb 15, 2016 at 5:24 PM, Ashok Kumar
wrote:
> Gurus,
>
> I am trying to
Mich:
You can pass jars for driver using:
spark.driver.extraClassPath
Cheers
On Mon, Feb 15, 2016 at 1:05 AM, Mich Talebzadeh
wrote:
> Thanks Deng. Unfortunately it seems that it looks for driver-class-path as
> well L
>
>
>
> For example with –jars alone I get
>
>
>
>
fill() was introduced in 1.3.1
Can you show code snippet which reproduces the error ?
I tried the following using spark-shell on master branch:
scala> df.na.fill(0)
res0: org.apache.spark.sql.DataFrame = [col: int]
Cheers
On Mon, Feb 15, 2016 at 3:36 AM, satish chandra j
Have you tried creating a DataFrame from the RDD and join with DataFrame
which corresponds to the hive table ?
On Sun, Feb 14, 2016 at 9:53 PM, SRK wrote:
> Hi,
>
> How to join an RDD with a hive table and retrieve only the records that I
> am
> interested. Suppose, I
Sounds reasonable.
Please consider posting question on Spark C* connector on their mailing
list if you have any.
On Sun, Feb 14, 2016 at 7:51 PM, Kevin Burton wrote:
> Afternoon.
>
> About 6 months ago I tried (and failed) to get Spark and Cassandra working
> together in
Do you mind trying Spark 1.6.0 ?
As far as I can tell, 'Cannot overwrite table' exception may only occur
for CreateTableUsingAsSelect when source and dest relations refer to the
same table in branch-1.6
Cheers
On Sun, Feb 14, 2016 at 9:29 PM, Ramanathan R
wrote:
> Hi
an wrote:
>
> Right, Thanks Ted.
>
> On Fri, Feb 12, 2016 at 10:21 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Minor correction: the class is CatalystTypeConverters.scala
>>
>> On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan <
>> <ymaha...@snap
Maybe a comment should be added to SparkPi.scala, telling user to look for
the value in stdout log ?
Cheers
On Sat, Feb 13, 2016 at 3:12 AM, Chandeep Singh
wrote:
> Try looking at stdout logs. I ran the exactly same job as you and did not
> see anything on the console
I have the following for my shell:
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=512m"
How do you specify MAVEN_OPTS ?
Which version of Java / maven do you use ?
Cheers
On Sat, Feb 13, 2016 at 7:34 AM, Milad khajavi wrote:
> Hello,
> When I want
Please take a look
at sql/core/src/main/scala/org/apache/spark/sql/functions.scala :
def udf(f: AnyRef, dataType: DataType): UserDefinedFunction = {
UserDefinedFunction(f, dataType, None)
And sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala :
test("udf") {
val foo =
mapWithState supports checkpoint.
There has been some bug fix since release of 1.6.0
e.g.
SPARK-12591 NullPointerException using checkpointed mapWithState with
KryoSerializer
which is in the upcoming 1.6.1
Cheers
On Sat, Feb 13, 2016 at 12:05 PM, Abhishek Anand
Ovidiu-Cristian:
Please see the following JIRA / PR :
[SPARK-12251] Document and improve off-heap memory configurations
Cheers
On Thu, Feb 11, 2016 at 11:06 PM, Sea <261810...@qq.com> wrote:
> spark.memory.offHeap.enabled (default is false) , it is wrong in spark
> docs. Spark1.6 do not
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
> at
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
>
> Regards,
>
> Zsolt
>
>
> 2016-02-12 13:11 GMT+01:00 Ted Yu <yuzhih...@gmail.com>:
>
>>
t;> pairs = lines.map(lambda x: (x, 1))
>> counts = pairs.reduceByKey(lambda a, b: a + b)
>> counts.collect()
>> ```
>>
>> On Fri, Feb 12, 2016 at 4:26 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Can you give a bit more information ?
>&g
Can you give a bit more information ?
release of Spark you use
full error trace
your code snippet
Thanks
On Fri, Feb 12, 2016 at 7:22 AM, Sisyphuss wrote:
> When trying the `reduceByKey` transformation on Python3.4, I got the
> following error:
>
> ImportError: No
Have you tried specifying multiple '--conf key=value' ?
Cheers
On Fri, Feb 12, 2016 at 7:44 AM, Ashish Soni wrote:
> Hi All ,
>
> How do i pass multiple configuration parameter while spark submit
>
> Please help i am trying as below
>
> spark-submit --conf
Please see:
[SPARK-13086][SHELL] Use the Scala REPL settings, to enable things like `-i
file`
On Thu, Feb 11, 2016 at 1:45 AM, Mich Talebzadeh <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:
> Hi,
>
>
>
> in Hive one can use -I parameter to preload certain setting into the
> beeline
I think SPARK_CLASSPATH is deprecated.
Can you show the command line launching your Spark job ?
Which Spark release do you use ?
Thanks
On Thu, Feb 11, 2016 at 5:38 PM, Charlie Wright
wrote:
> built and installed hadoop with:
> mvn package -Pdist -DskipTests -Dtar
>
The Spark driver does not run on the YARN cluster in client mode, only the
Spark executors do.
Can you check YARN logs for the failed job to see if there was more clue ?
Does the YARN cluster run the customized hadoop or stock hadoop ?
Cheers
On Thu, Feb 11, 2016 at 5:44 PM, Charlie Wright
Minor correction: the class is CatalystTypeConverters.scala
On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan
wrote:
> CatatlystTypeConverters.scala has all types of utility methods to convert
> from Scala to row and vice a versa.
>
>
> On Fri, Feb 12, 2016 at 12:21 AM,
>From the head of HiveThriftServer2 :
* The main entry point for the Spark SQL port of HiveServer2. Starts up a
`SparkSQLContext` and a
* `HiveThriftServer2` thrift server.
Looking at HiveServer2.java from Hive, looks like it uses thrift protocol.
FYI
On Thu, Feb 11, 2016 at 9:34 AM,
bq. Whether sContext(SQlCOntext) will help to query in both the dataframes
and will it decide on which dataframe to query for .
Can you clarify what you were asking ?
The queries would be carried out on respective DataFrame's as shown in your
snippet.
On Thu, Feb 11, 2016 at 8:47 AM, Gaurav
uot; and it gives an error message that a TypedColumn
> is expected.
>
> Regards,
> Raghava.
>
>
> On Tue, Feb 9, 2016 at 10:12 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Please take a look at:
>> sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
&g
What Partitioner do you use ?
Have you tried using RangePartitioner ?
Cheers
On Wed, Feb 10, 2016 at 3:54 PM, daze5112 wrote:
> Hi im trying to improve the performance of some code im running but have
> noticed that my distribution of my RDD across executors isn't
Have you tried adding hbase client jars to spark.executor.extraClassPath ?
Cheers
On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph
wrote:
> + Spark-Dev
>
> For a Spark job on YARN accessing hbase table, added all hbase client jars
> into spark.yarn.dist.files,
401 - 500 of 1611 matches
Mail list logo