date:20150621

Re: Task Serialization Error on DataFrame.foreachPartition

2015-06-21 Thread Ted Yu

Can you show us the code for loading Hive into hbase ? There shouldn't be 'return' statement in that code. Cheers > On Jun 20, 2015, at 10:10 PM, Nishant Patel wrote: > > Hi, > > I am loading data from Hive table to Hbase after doing some manipulation. > > I am getting error as 'Task not

[Spark 1.3.1 SQL] Using Hive

2015-06-21 Thread Mike Frampton

Hi Is it true that if I want to use Spark SQL ( for Spark 1.3.1 ) against Apache Hive I need to build a source version of Spark ? Im using CDH 5.3 on CentOS Linux 6.5 which uses Hive 0.13.0 ( I think ). cheers Mike F

memory needed for each executor

2015-06-21 Thread pth001

Hi, How can I know the size of memory needed for each executor (one core) to execute each job? If there are many cores per executors, will the memory be the multiplication (memory needed for each executor (one core) * no. of cores)? Any suggestions/guidelines? BR, Patcharee ---

How to use an different version of hive

2015-06-21 Thread Sea

Hi, all: We have an own version of hive 0.13.1, we alter the code about permissions of operating table and an issue of hive 0.13.1 HIVE-6131 Spark 1.4.0 support different versions of hive metastore, who can give an example? I am confused of these spark.sql.hive.metastore.jars spark.sql.hive.me

Re?? Abount Jobs UI in yarn-client mode

2015-06-21 Thread Sea

Thanks?? it is ok now?? -- -- ??: "Gavin Yue";; : 2015??6??21??(??) 4:40 ??: "Sea"<261810...@qq.com>; : "user"; : Re: Abount Jobs UI in yarn-client mode I got the same problem when I upgrade from 1.3.1 to 1.4. The

Reducer memory usage

2015-06-21 Thread Corey Nolet

I've seen a few places where it's been mentioned that after a shuffle each reducer needs to pull its partition into memory in its entirety. Is this true? I'd assume the merge sort that needs to be done (in the cases where sortByKey() is not used) wouldn't need to pull all of the data into memory at

PartitionBy/Partitioner for dataFrames?

2015-06-21 Thread Tom Hubregtsen

Hi, I am trying to rewrite my program to use dataFrames, and I see that I can perform a mapPartitions and a foreachPartition, but can I perform a partitionBy/set a partitioner? Or is there some other way to make my data land in the right partition for *Partition to use? (I see that PartitionBy is

Problem attaching to YARN

2015-06-21 Thread Shawn Garbett

I've spent the last 3 days trying to get a connection to YARN from spark on a single box to work through examples. I'm at a loss. It's a dual core box, running Jessie Debian. I've tried both Java 7 and Java 8 from Oracle. It has Hadoop 2.7 installed and YARN running. Scala version 2.10.4, and Spar

Spark 1.4.0 SQL JDBC "partition stride"?

2015-06-21 Thread Keith Freeman

The spark docs section for "JDBC to Other Databases" (https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases) describes the partitioning as "... Notice that lowerBound and upperBound are just used to decide the partition stride, not for filtering the rows in tab

Re: Using Accumulators in Streaming

2015-06-21 Thread Michal Čizmazia

StreamingContext.sparkContext() On 21 June 2015 at 21:32, Will Briggs wrote: > It sounds like accumulators are not necessary in Spark Streaming - see > this post ( > http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) > for more details. > > > On

Re: Using Accumulators in Streaming

2015-06-21 Thread Will Briggs

It sounds like accumulators are not necessary in Spark Streaming - see this post ( http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) for more details. On June 21, 2015, at 7:31 PM, anshu shukla wrote: In spark Streaming ,Since we are already

Using Accumulators in Streaming

2015-06-21 Thread anshu shukla

In spark Streaming ,Since we are already having Streaming context , which does not allows us to have accumulators .We have to get sparkContext for initializing accumulator value . But having 2 spark context will not serve the problem . Please Help !! -- Thanks & Regards, Anshu Shukla

Updation of Static variable inside foreachRDD method

2015-06-21 Thread anshu shukla

I want to log timestamp of every element of the RDD so i have assigned the MSGid to every elemnt inside RDD,and increamented it.(static variable). My code is giving distinct Msgid in local mode but in cluster mode this value is duplicated every 30-40 count. Please help !! //public static long ms

Re: Velox Model Server

2015-06-21 Thread Sean Owen

Out of curiosity why netty? What model are you serving? Velox doesn't look like it is optimized for cases like ALS recs, if that's what you mean. I think scoring ALS at scale in real time takes a fairly different approach. The servlet engine probably doesn't matter at all in comparison. On Sat, Ju

s3 - Can't make directory for path

2015-06-21 Thread nizang

hi, I'm trying to setup a standalone server, and in one of my tests, I got the following exception: java.io.IOException: Can't make directory for path 's3n://ww-sandbox/name_of_path' since it is a file. at org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdir(NativeS3FileSystem.java:541)

Fwd: How to get and parse whole xml file in HDFS by Spark Streaming

2015-06-21 Thread Yong Feng

Hi Spark Experts I have a customer who wants to monitor coming data files (with xml format), and then analysize them after that put analysized data into DB. The size of each file is about 30MB (or even less in future). Spark streaming seems promising. After learning Spark Streaming and also googl

Re: Spark Titan

2015-06-21 Thread Nick Pentreath

> Something like this works (or at least worked with titan 0.4 back when I > was using it): > > > val graph = sc.newAPIHadoopRDD( > configuration, > fClass = classOf[TitanHBaseInputFormat], > kClass = classOf[NullWritable], > vClass = classOf[FaunusVertex]) > graph.flatMap { vertex

Re: Spark Titan

2015-06-21 Thread Akhil Das

Have a look at http://s3.thinkaurelius.com/docs/titan/0.5.0/titan-io-format.html You could use those Input/Output formats with newAPIHadoopRDD api call. Thanks Best Regards On Sun, Jun 21, 2015 at 8:50 PM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi, > > How to connect TItan dat

Re: Spark 1.4 History Server - HDP 2.2

2015-06-21 Thread Steve Loughran

> On 20 Jun 2015, at 17:37, Ashish Soni wrote: > > Can any one help i am getting below error when i try to start the History > Server > I do not see any org.apache.spark.deploy.yarn.history.pakage inside the > assembly jar not sure how to get that > > java.lang.ClassNotFoundException:

Spark Titan

2015-06-21 Thread Madabhattula Rajesh Kumar

Hi, How to connect TItan database from Spark? Any out of the box api's available? Regards, Rajesh

Re: Java Constructor Issues

2015-06-21 Thread Davies Liu

The compiled jar is not consistent with Python source, maybe you are using a older version pyspark, but with assembly jar of Spark Core 1.4? On Sun, Jun 21, 2015 at 7:24 AM, Shaanan Cohney wrote: > > Hi all, > > > I'm having an issue running some code that works on a build of spark I made > (and

Fwd: Java Constructor Issues

2015-06-21 Thread Shaanan Cohney

Hi all, I'm having an issue running some code that works on a build of spark I made (and still have) but now rebuilding it again, I get the below traceback. I built it using the 1.4.0 release, profile hadoop-2.4 but version 2.7 and I'm using python3. It's not vital to my work (as I can use my oth

Re: Verifying number of workers in Spark Streaming

2015-06-21 Thread Silvio Fiorito

If you look at your streaming app UI you should see how many tasks are executed each batch and on how many executors. This is dependent on the batch duration and block interval, which defaults to 200ms. So every block interval a partition will be generated. You can control the parallelism by adj

Re: Task Serialization Error on DataFrame.foreachPartition

[Spark 1.3.1 SQL] Using Hive

memory needed for each executor

How to use an different version of hive

Re?? Abount Jobs UI in yarn-client mode

Reducer memory usage

PartitionBy/Partitioner for dataFrames?

Problem attaching to YARN

Spark 1.4.0 SQL JDBC "partition stride"?

Re: Using Accumulators in Streaming

Re: Using Accumulators in Streaming

Using Accumulators in Streaming

Updation of Static variable inside foreachRDD method

Re: Velox Model Server

s3 - Can't make directory for path

Fwd: How to get and parse whole xml file in HDFS by Spark Streaming

Re: Spark Titan

Re: Spark Titan

Re: Spark 1.4 History Server - HDP 2.2

Spark Titan

Re: Java Constructor Issues

Fwd: Java Constructor Issues

Re: Verifying number of workers in Spark Streaming

23 matches

Site Navigation

Mail list logo

Footer information