Re: about LIVY-424
I'm using livy-0.5.0 with spark2.3.0,I started a session with 4GB mem for Driver, And I run code server times : var tmp1 = spark.sql("use tpcds_bin_partitioned_orc_2");var tmp2 = spark.sql("select count(1) from tpcds_bin_partitioned_orc_2.store_sales").show the table have 5760749 rows data. after run about 10 times , the Driver physical memory will beyond 4.5GB and killed by yarn. I saw the old generation memory keep growing and can not release by gc. 2018-11-12 lk_spark 发件人:"lk_hadoop" 发送时间:2018-11-12 09:37 主题:about LIVY-424 收件人:"user" 抄送: hi,all: I meet this issue https://issues.apache.org/jira/browse/LIVY-424 , anybody know how to resolve it? 2018-11-12 lk_hadoop
Re: writing to local files on a worker
Can you use JNI to call the c++ functionality directly from Java? Or you wrap this into a MR step outside Spark and use Hadoop Streaming (it allows you to use shell scripts as mapper and reducer)? You can also write temporary files for each partition and execute the software within a map step. Generally you should not call external applications from Spark. > Am 11.11.2018 um 23:13 schrieb Steve Lewis : > > I have a problem where a critical step needs to be performed by a third > party c++ application. I can send or install this program on the worker > nodes. I can construct a function holding all the data this program needs to > process. The problem is that the program is designed to read and write from > the local file system. I can call the program from Java and read its output > as a local file - then deleting all temporary files but I doubt that it is > possible to get the program to read from hdfs or any shared file system. > My question is can a function running on a worker node create temporary files > and pass the names of these to a local process assuming everything is cleaned > up after the call? > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: writing to local files on a worker
Hello, You could try using mapPartitions function if you can send partial data to your C++ program: mapPartitions(func): Similar to map, but runs separately on each partition (block) of the RDD, so /func/ must be of type Iterator => Iterator when running on an RDD of type T. That way you can write partition data to temp file, call your C++ app, then delete the temp file. Of course your data would be limited to all rows in one partition. Also the latest release of Spark (2.4.0) introduced barrier execution mode: https://issues.apache.org/jira/browse/SPARK-24374 Maybe you could combine the two, just using mapPartitions will give you single partition data only, and your app call will be repeated on all nodes, not necessarily at the same time. Spark's strong point is parallel execution, so what you're trying to do kind of defeats that. But if you do not need to combine all the data before calling your app then you could do it. Or you could split your job into Spark -> app -> Spark chain. Good luck, Joe On 11/11/2018 02:13 PM, Steve Lewis wrote: I have a problem where a critical step needs to be performed by a third party c++ application. I can send or install this program on the worker nodes. I can construct a function holding all the data this program needs to process. The problem is that the program is designed to read and write from the local file system. I can call the program from Java and read its output as a local file - then deleting all temporary files but I doubt that it is possible to get the program to read from hdfs or any shared file system. My question is can a function running on a worker node create temporary files and pass the names of these to a local process assuming everything is cleaned up after the call? -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Scala: The Util is not accessible in def main
It is intentionally not accessible in your code since Utils is internal Spark code, not part of the public API. Changing Spark to make that private code public would be inviting trouble, or at least future headaches. If you don't already know how to build and maintain your own custom fork of Spark with those private Utils made public, then you probably shouldn't be thinking about doing so. On Sun, Nov 11, 2018 at 2:13 AM Soheil Pourbafrani wrote: > Hi, > I want to use org.apache.spark.util.Utils library in def main but I got > the error: > > Symbole Util is not accessible from this place. Here is the code: > > val temp = tokens.map(word => Utils.nonNegativeMod(x, y)) > > How can I make it accessible? >
writing to local files on a worker
I have a problem where a critical step needs to be performed by a third party c++ application. I can send or install this program on the worker nodes. I can construct a function holding all the data this program needs to process. The problem is that the program is designed to read and write from the local file system. I can call the program from Java and read its output as a local file - then deleting all temporary files but I doubt that it is possible to get the program to read from hdfs or any shared file system. My question is can a function running on a worker node create temporary files and pass the names of these to a local process assuming everything is cleaned up after the call? -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Scala: The Util is not accessible in def main
Hi, I want to use org.apache.spark.util.Utils library in def main but I got the error: Symbole Util is not accessible from this place. Here is the code: val temp = tokens.map(word => Utils.nonNegativeMod(x, y)) How can I make it accessible?