> 在 2017年10月4日,上午2:08,Nicolas Paris 写道:
>
> Hi
>
> I wonder the differences accessing HIVE tables in two different ways:
> - with jdbc access
> - with sparkContext
>
> I would say that jdbc is better since it uses HIVE that is based on
> map-reduce / TEZ and then works on
Hi All,
I am constantly hitting an error : "ApplicationMaster:
SparkContext did not initialize after waiting for 100 ms" while running my
Spark code in yarn cluster mode.
Here is the command what I am using :* spark-submit --master yarn
--deploy-mode cluster spark_code.py*
Suggest you reading «Hadoop Application Architectures» (orelly) by Mark Grover,
Ted Malaska and others. There you can find some answers for your questions.
> 10 окт. 2017 г., в 9:00, Mahender Sarangam
> написал(а):
>
> Hi,
>
> I'm new to spark and big data, we
I need to have a location column inside my Dataframe so that I can do
spatial queries and geometry operations. Are there any third-party packages
that perform this kind of operations. I have seen a few like Geospark and
megalan but they don't support operations where spatial and logical
operators
Hi,
I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path",
minPartitions).
How can I control the no.of tasks by increasing the split size? With
default split size of 250 MB, several tasks are created. But I would like
to have a specific no.of tasks created while reading from
I have also tried these. And none of them actually compile.
dataset.map(new MapFunction>>() {
@Override
public Seq
Write your own input format/datasource or split the file yourself beforehand
(not recommended).
> On 10. Oct 2017, at 09:14, Kanagha Kumar wrote:
>
> Hi,
>
> I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path",
> minPartitions).
>
> How can I
That is not correct, IMHO. If I am not wrong, Spark will still load data in
executor, by running some stats on the data itself to identify
partitions
On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 wrote:
>
> > 在 2017年10月4日,上午2:08,Nicolas Paris 写道:
> >
>
I have not tested this, but you should be able to pass on any map-reduce
like conf to underlying hadoop config.essentially you should be able to
control behaviour of split as you can do in a map-reduce program (as Spark
uses the same input format)
On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke
Hi
My environment:
Windows 10,
Spark 1.6.1 built for Hadoop 2.6.0 Build
Python 2.7
Java 1.8
Issue:
Go to C:\Spark
The command:
bin\spark-submit --master local C:\Spark\examples\src\main\python\pi.py 10
gives:
File "", line 1
bin\spark-submit --master local
Hi,
Which spatial operations do you require exactly? Also, I don't follow what
you mean by combining logical operators?
I have created a library that wraps Lucene's spatial functionality here:
https://github.com/zouzias/spark-lucenerdd/wiki/Spatial-search
You could give a try to the library, it
Hi all,
GeoMesa integrates with Spark SQL and allows for queries like:
select * from chicago where case_number = 1 and st_intersects(geom,
st_makeBox2d(st_point(-77, 38), st_point(-76, 39)))
GeoMesa does this by calling package protected Spark methods to
implement geospatial user defined
There’s a number of packages for geospatial analysis, depends on the features
you need. Here are a few I know of and/or have used:
Magellan: https://github.com/harsha2010/magellan
MrGeo: https://github.com/ngageoint/mrgeo
GeoMesa: http://www.geomesa.org/documentation/tutorials/spark.html
Something along the line of:
Dataset df = spark.read().json(jsonDf); ?
From: kant kodali [mailto:kanth...@gmail.com]
Sent: Saturday, October 07, 2017 2:31 AM
To: user @spark
Subject: How to convert Array of Json rows into Dataset of specific columns in
Spark 2.2.0?
I
Try increasing the `spark.yarn.am.waitTime` parameter, it's by default set
to 100ms which might not be enough in certain cases.
On Tue, Oct 10, 2017 at 7:02 AM, Debabrata Ghosh
wrote:
> Hi All,
> I am constantly hitting an error : "ApplicationMaster:
>
What about someting like gromesa?
Anastasios Zouzias schrieb am Di. 10. Okt. 2017 um
15:29:
> Hi,
>
> Which spatial operations do you require exactly? Also, I don't follow what
> you mean by combining logical operators?
>
> I have created a library that wraps Lucene's spatial
Thanks for the inputs!!
I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to the
size I wanted to read. It didn't take any effect.
I also tried passing in spark.dfs.block.size, with all the params set to
the same value.
that's probably better be directed to the AWS support
On Sun, Oct 8, 2017 at 9:54 PM, Tushar Sudake wrote:
> Hello everyone,
>
> I'm using 'r4.8xlarge' instances on EMR for my Spark Application.
> To each node, I'm attaching one 512 GB EBS volume.
>
> By logging in into
why can't you do this in Magellan?
Can you post a sample query that you are trying to run that has spatial and
logical operators combined? Maybe I am not understanding the issue properly
Ram
On Tue, Oct 10, 2017 at 2:21 AM, Imran Rajjad wrote:
> I need to have a location
Maybe you need to set the parameters for the mapreduce api and not the mapred
api. I do not have in mind now how they differ but the Hadoop web page should
tell you ;-)
> On 10. Oct 2017, at 17:53, Kanagha Kumar wrote:
>
> Thanks for the inputs!!
>
> I passed in
Have you seen this:
https://stackoverflow.com/questions/42796561/set-hadoop-configuration-values-on-spark-submit-command-line
? Please try and let us know.
On Wed, Oct 11, 2017 at 2:53 AM, Kanagha Kumar
wrote:
> Thanks for the inputs!!
>
> I passed in
Is Hive from Spark via JDBC working for you? In case it does, I would be
interested in your setup :-)
We can't get this working. See bug here, especially my last comment:
https://issues.apache.org/jira/browse/SPARK-21063
Regards
Andreas
--
Sent from:
I am able to connect to Spark via JDBC - tested with Squirrel. I am referencing
all the jars of current Spark distribution under
/usr/hdp/current/spark2-client/jars/*
Thanks,
Reema
-Original Message-
From: weand [mailto:andreas.we...@gmail.com]
Sent: Tuesday, October 10, 2017 5:14 PM
Thanks Ayan!
Finally it worked!! Thanks a lot everyone for the inputs!
Once I prefixed the params with "spark.hadoop", I see the no.of tasks
getting reduced.
I'm setting the following params:
--conf spark.hadoop.dfs.block.size
--conf spark.hadoop.mapreduce.input.fileinputformat.split.minsize
Hi,
I do not think that SPARK will automatically determine the partitions.
Actually it does not automatically determine the partitions. In case a
table has a few million records, it all goes through the driver.
Ofcourse, I have only tried JDBC connections in AURORA, Oracle and Postgres.
Hi,
I'm new to spark and big data, we are doing some poc and building our
warehouse application using Spark. Can any one share with me guidance
like Naming Convention for HDFS Name,Table Names, UDF and DB Name. Any
sample architecture diagram.
-Mahens
Thanks Vadim!
Sent from my iPhone
> On 10-Oct-2017, at 11:09 PM, Vadim Semenov
> wrote:
>
> Try increasing the `spark.yarn.am.waitTime` parameter, it's by default set to
> 100ms which might not be enough in certain cases.
>
>> On Tue, Oct 10, 2017 at 7:02 AM,
27 matches
Mail list logo