Re: about LIVY-424

2018-11-11 Thread lk_spark
I'm using livy-0.5.0 with spark2.3.0,I started  a  session with 4GB mem for 
Driver, And I run code server times :
 var tmp1 = spark.sql("use tpcds_bin_partitioned_orc_2");var tmp2 = 
spark.sql("select count(1) from tpcds_bin_partitioned_orc_2.store_sales").show
the table have 5760749 rows data.
after run about 10 times , the Driver physical memory will beyond 4.5GB and 
killed by yarn.
I saw the old generation memory  keep growing and can not release by gc.

2018-11-12 

lk_spark 



发件人:"lk_hadoop"
发送时间:2018-11-12 09:37
主题:about LIVY-424
收件人:"user"
抄送:

hi,all:
I meet this issue https://issues.apache.org/jira/browse/LIVY-424  , 
anybody know how to resolve it?
2018-11-12


lk_hadoop 

Re: writing to local files on a worker

2018-11-11 Thread Jörn Franke
Can you use JNI to call the c++ functionality directly from Java? 

Or you wrap this into a MR step outside Spark and use Hadoop Streaming (it 
allows you to use shell scripts as mapper and reducer)?

You can also write temporary files for each partition and execute the software 
within a map step.

Generally you should not call external applications from Spark.

> Am 11.11.2018 um 23:13 schrieb Steve Lewis :
> 
> I have a problem where a critical step needs to be performed by  a third 
> party c++ application. I can send or install this program on the worker 
> nodes. I can construct  a function holding all the data this program needs to 
> process. The problem is that the program is designed to read and write from 
> the local file system. I can call the program from Java and read its output 
> as  a  local file - then deleting all temporary files but I doubt that it is 
> possible to get the program to read from hdfs or any shared file system. 
> My question is can a function running on a worker node create temporary files 
> and pass the names of these to a local process assuming everything is cleaned 
> up after the call?
> 
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: writing to local files on a worker

2018-11-11 Thread Joe

Hello,
You could try using mapPartitions function if you can send partial data 
to your C++ program:


mapPartitions(func):
Similar to map, but runs separately on each partition (block) of the 
RDD, so /func/ must be of type Iterator => Iterator when running 
on an RDD of type T.


That way you can write partition data to temp file, call your C++ app, 
then delete the temp file. Of course your data would be limited to all 
rows in one partition.


Also the latest release of Spark (2.4.0) introduced barrier execution mode:
https://issues.apache.org/jira/browse/SPARK-24374

Maybe you could combine the two, just using mapPartitions will give you 
single partition data only, and your app call will be repeated on all 
nodes, not necessarily at the same time.


Spark's strong point is parallel execution, so what you're trying to do 
kind of defeats that.
But if you do not need to combine all the data before calling your app 
then you could do it.

Or you could split your job into Spark -> app -> Spark chain.
Good luck,

Joe



On 11/11/2018 02:13 PM, Steve Lewis wrote:
I have a problem where a critical step needs to be performed by  a 
third party c++ application. I can send or install this program on the 
worker nodes. I can construct  a function holding all the data this 
program needs to process. The problem is that the program is designed 
to read and write from the local file system. I can call the program 
from Java and read its output as  a  local file - then deleting all 
temporary files but I doubt that it is possible to get the program to 
read from hdfs or any shared file system.
My question is can a function running on a worker node create 
temporary files and pass the names of these to a local process 
assuming everything is cleaned up after the call?


--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Scala: The Util is not accessible in def main

2018-11-11 Thread Mark Hamstra
It is intentionally not accessible in your code since Utils is internal
Spark code, not part of the public API. Changing Spark to make that private
code public would be inviting trouble, or at least future headaches. If you
don't already know how to build and maintain your own custom fork of Spark
with those private Utils made public, then you probably shouldn't be
thinking about doing so.

On Sun, Nov 11, 2018 at 2:13 AM Soheil Pourbafrani 
wrote:

> Hi,
> I want to use org.apache.spark.util.Utils library in def main but I got
> the error:
>
> Symbole Util is not accessible from this place. Here is the code:
>
> val temp = tokens.map(word => Utils.nonNegativeMod(x, y))
>
> How can I make it accessible?
>


writing to local files on a worker

2018-11-11 Thread Steve Lewis
I have a problem where a critical step needs to be performed by  a third
party c++ application. I can send or install this program on the worker
nodes. I can construct  a function holding all the data this program needs
to process. The problem is that the program is designed to read and write
from the local file system. I can call the program from Java and read its
output as  a  local file - then deleting all temporary files but I doubt
that it is possible to get the program to read from hdfs or any shared file
system.
My question is can a function running on a worker node create temporary
files and pass the names of these to a local process assuming everything is
cleaned up after the call?

-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com


Scala: The Util is not accessible in def main

2018-11-11 Thread Soheil Pourbafrani
Hi,
I want to use org.apache.spark.util.Utils library in def main but I got the
error:

Symbole Util is not accessible from this place. Here is the code:

val temp = tokens.map(word => Utils.nonNegativeMod(x, y))

How can I make it accessible?