Can you show us the code for loading Hive into hbase ?
There shouldn't be 'return' statement in that code.
Cheers
> On Jun 20, 2015, at 10:10 PM, Nishant Patel wrote:
>
> Hi,
>
> I am loading data from Hive table to Hbase after doing some manipulation.
>
> I am getting error as 'Task not
Hi
Is it true that if I want to use Spark SQL ( for Spark 1.3.1 ) against Apache
Hive I need to build a source version of Spark ?
Im using CDH 5.3 on CentOS Linux 6.5 which uses Hive 0.13.0 ( I think ).
cheers
Mike F
Hi,
How can I know the size of memory needed for each executor (one core) to
execute each job? If there are many cores per executors, will the memory
be the multiplication (memory needed for each executor (one core) * no.
of cores)?
Any suggestions/guidelines?
BR,
Patcharee
---
Hi, all:
We have an own version of hive 0.13.1, we alter the code about permissions of
operating table and an issue of hive 0.13.1 HIVE-6131
Spark 1.4.0 support different versions of hive metastore, who can give an
example?
I am confused of these
spark.sql.hive.metastore.jars
spark.sql.hive.me
Thanks?? it is ok now??
-- --
??: "Gavin Yue";;
: 2015??6??21??(??) 4:40
??: "Sea"<261810...@qq.com>;
: "user";
: Re: Abount Jobs UI in yarn-client mode
I got the same problem when I upgrade from 1.3.1 to 1.4.
The
I've seen a few places where it's been mentioned that after a shuffle each
reducer needs to pull its partition into memory in its entirety. Is this
true? I'd assume the merge sort that needs to be done (in the cases where
sortByKey() is not used) wouldn't need to pull all of the data into memory
at
Hi,
I am trying to rewrite my program to use dataFrames, and I see that I can
perform a mapPartitions and a foreachPartition, but can I perform a
partitionBy/set a partitioner? Or is there some other way to make my data
land in the right partition for *Partition to use? (I see that PartitionBy
is
I've spent the last 3 days trying to get a connection to YARN from spark on
a single box to work through examples. I'm at a loss.
It's a dual core box, running Jessie Debian. I've tried both Java 7 and
Java 8 from Oracle. It has Hadoop 2.7 installed and YARN running. Scala
version 2.10.4, and Spar
The spark docs section for "JDBC to Other Databases"
(https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases)
describes the partitioning as "... Notice that lowerBound and upperBound
are just used to decide the partition stride, not for filtering the rows
in tab
StreamingContext.sparkContext()
On 21 June 2015 at 21:32, Will Briggs wrote:
> It sounds like accumulators are not necessary in Spark Streaming - see
> this post (
> http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
> for more details.
>
>
> On
It sounds like accumulators are not necessary in Spark Streaming - see this
post (
http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
for more details.
On June 21, 2015, at 7:31 PM, anshu shukla wrote:
In spark Streaming ,Since we are already
In spark Streaming ,Since we are already having Streaming context , which
does not allows us to have accumulators .We have to get sparkContext for
initializing accumulator value .
But having 2 spark context will not serve the problem .
Please Help !!
--
Thanks & Regards,
Anshu Shukla
I want to log timestamp of every element of the RDD so i have
assigned the MSGid to every elemnt inside RDD,and increamented
it.(static variable).
My code is giving distinct Msgid in local mode but in cluster mode
this value is duplicated every 30-40 count.
Please help !!
//public static long ms
Out of curiosity why netty?
What model are you serving?
Velox doesn't look like it is optimized for cases like ALS recs, if that's
what you mean. I think scoring ALS at scale in real time takes a fairly
different approach.
The servlet engine probably doesn't matter at all in comparison.
On Sat, Ju
hi,
I'm trying to setup a standalone server, and in one of my tests, I got the
following exception:
java.io.IOException: Can't make directory for path
's3n://ww-sandbox/name_of_path' since it is a file.
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdir(NativeS3FileSystem.java:541)
Hi Spark Experts
I have a customer who wants to monitor coming data files (with xml format),
and then analysize them after that put analysized data into DB. The size of
each file is about 30MB (or even less in future). Spark streaming seems
promising.
After learning Spark Streaming and also googl
> Something like this works (or at least worked with titan 0.4 back when I
> was using it):
>
>
> val graph = sc.newAPIHadoopRDD(
> configuration,
> fClass = classOf[TitanHBaseInputFormat],
> kClass = classOf[NullWritable],
> vClass = classOf[FaunusVertex])
> graph.flatMap { vertex
Have a look at
http://s3.thinkaurelius.com/docs/titan/0.5.0/titan-io-format.html You could
use those Input/Output formats with newAPIHadoopRDD api call.
Thanks
Best Regards
On Sun, Jun 21, 2015 at 8:50 PM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi,
>
> How to connect TItan dat
> On 20 Jun 2015, at 17:37, Ashish Soni wrote:
>
> Can any one help i am getting below error when i try to start the History
> Server
> I do not see any org.apache.spark.deploy.yarn.history.pakage inside the
> assembly jar not sure how to get that
>
> java.lang.ClassNotFoundException:
Hi,
How to connect TItan database from Spark? Any out of the box api's
available?
Regards,
Rajesh
The compiled jar is not consistent with Python source, maybe you are
using a older version pyspark, but with assembly jar of Spark Core
1.4?
On Sun, Jun 21, 2015 at 7:24 AM, Shaanan Cohney wrote:
>
> Hi all,
>
>
> I'm having an issue running some code that works on a build of spark I made
> (and
Hi all,
I'm having an issue running some code that works on a build of spark I made
(and still have) but now rebuilding it again, I get the below traceback. I
built it using the 1.4.0 release, profile hadoop-2.4 but version 2.7 and
I'm using python3. It's not vital to my work (as I can use my oth
If you look at your streaming app UI you should see how many tasks are executed
each batch and on how many executors. This is dependent on the batch duration
and block interval, which defaults to 200ms. So every block interval a
partition will be generated. You can control the parallelism by adj
23 matches
Mail list logo