from:"Jong Wook Kim"

Re: Is RankingMetrics' NDCG implementation correct?

2016-09-19 Thread Jong Wook Kim

so > don't think it's valid to use it as such. > > On Mon, Sep 19, 2016 at 4:42 AM, Jong Wook Kim <jongw...@nyu.edu> wrote: > > Hi, > > > > I'm trying to evaluate a recommendation model, and found that Spark and > > Rival give different results, and it seems that

Is RankingMetrics' NDCG implementation correct?

2016-09-18 Thread Jong Wook Kim

Hi, I'm trying to evaluate a recommendation model, and found that Spark and Rival give different results, and it seems that Rival's one is what Kaggle defines :

Re: AVRO vs Parquet

2016-03-03 Thread Jong Wook Kim

How about ORC? I have experimented briefly with Parquet and ORC, and I liked the fact that ORC has its schema within the file, which makes it handy to work with any other tools. Jong Wook On 3 March 2016 at 23:29, Don Drake wrote: > My tests show Parquet has better

Spark-shell connecting to Mesos stuck at sched.cpp

2015-11-15 Thread Jong Wook Kim

I'm having problem connecting my spark app to a Mesos cluster; any help on the below question would be appreciated. http://stackoverflow.com/questions/33727154/spark-shell-connecting-to-mesos-stuck-at-sched-cpp Thanks, Jong Wook

Spark YARN Shuffle service wire compatibility

2015-10-22 Thread Jong Wook Kim

Hi, I’d like to know if there is a guarantee that Spark YARN shuffle service has wire compatibility between 1.x versions. I could run Spark 1.5 job with YARN nodemanagers having shuffle service 1.4, but it might’ve been just a coincidence. Now we’re upgrading CDH to 5.3 to 5.4, whose

Re: About extra memory on yarn mode

2015-07-14 Thread Jong Wook Kim

executor.memory only sets the maximum heap size of executor and the JVM needs non-heap memory to store class metadata, interned strings and other native overheads coming from networking libraries, off-heap storage levels, etc. These are (of course) legitimate usage of resources and you'll have

Re: ProcessBuilder in SparkLauncher is memory inefficient for launching new process

2015-07-14 Thread Jong Wook Kim

The article you've linked, is specific to an embedded system. the JVM built for that architecture (which the author didn't mention) might not be as stable and well-supported as HotSpot. ProcessBuilder is a stable Java API and despite somewhat limited functionality it is the standard method to

Re: How to maintain multiple JavaRDD created within another method like javaStreamRDD.forEachRDD

2015-07-14 Thread Jong Wook Kim

Your question is not very clear, but from what I understand, you want to deal with a stream of MyTable that has parsed records from your Kafka topics. What you need is JavaDStreamMyTable, and you can use transform()

Re: RECEIVED SIGNAL 15: SIGTERM

2015-07-12 Thread Jong Wook Kim

Based on my experience, YARN containers can get SIGTERM when - it produces too much logs and use up the hard drive - it uses off-heap memory more than what is given by spark.yarn.executor.memoryOverhead configuration. It might be due to too many classes loaded (less than MaxPermGen but more

Streaming checkpoints and logic change

2015-07-08 Thread Jong Wook Kim

I just asked this question at the streaming webinar that just ended, but the speakers didn't answered so throwing here: AFAIK checkpoints are the only recommended method for running Spark streaming without data loss. But it involves serializing the entire dstream graph, which prohibits any logic

Re: Streaming checkpoints and logic change

2015-07-08 Thread Jong Wook Kim

, and as the transform function is processed in every batch interval, it will always use the latest filters. HTH. TD On Wed, Jul 8, 2015 at 10:02 AM, Jong Wook Kim jongw...@nyu.edu wrote: I just asked this question at the streaming webinar that just ended, but the speakers didn't answered so

Re: Custom streaming receiver slow on YARN

2015-02-09 Thread Jong Wook Kim

replying to my own thread; I realized that this only happens when the replication level is 1. Regardless of whether setting memory_only or disk or deserialized, I had to make the replication level = 2 to make the streaming work properly on YARN. I still don't get it why, because intuitively less

Re: saveAsTextFile of RDD[Array[Any]]

2015-02-09 Thread Jong Wook Kim

If you have `RDD[Array[Any]]` you can do rdd.map(_.mkString(\t)) or with some other delimiter to make it `RDD[String]`, and then call `saveAsTextFile`. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-of-RDD-Array-Any-tp21548p21554.html

Custom streaming receiver slow on YARN

2015-02-07 Thread Jong Wook Kim

Hello people, I have an issue that my streaming receiver is laggy on YARN. Can anyone reply to my question on StackOverflow?: http://stackoverflow.com/questions/28370362/spark-streaming-receiver-particularly-slow-on-yarn Thanks Jong Wook -- View this message in context:

Re: Is RankingMetrics' NDCG implementation correct?

Is RankingMetrics' NDCG implementation correct?

Re: AVRO vs Parquet

Spark-shell connecting to Mesos stuck at sched.cpp

Spark YARN Shuffle service wire compatibility

Re: About extra memory on yarn mode

Re: ProcessBuilder in SparkLauncher is memory inefficient for launching new process

Re: How to maintain multiple JavaRDD created within another method like javaStreamRDD.forEachRDD

Re: RECEIVED SIGNAL 15: SIGTERM

Streaming checkpoints and logic change

Re: Streaming checkpoints and logic change

Re: Custom streaming receiver slow on YARN

Re: saveAsTextFile of RDD[Array[Any]]

Custom streaming receiver slow on YARN

14 matches

Site Navigation

Mail list logo

Footer information