RE: A question about "data analytics"

2012-04-11 Thread Djordje Jevdjic
Hello Fu Bin-zhang, The error message is very weird because FileSplit is a class derived from InputSplit, and the conversion is legal. However, I've seen this message several times. The error is highly likely related to the location of the hadoop tmp directory. Could you please compress and se

RE: RE: A question about "data analytics"

2012-04-12 Thread Djordje Jevdjic
t.ac.cn] Sent: Thursday, April 12, 2012 3:27 PM To: Djordje Jevdjic Subject: Re: RE: A question about "data analytics" Hello Djordje, Thanks for your advice, the problem is really caused by the tmp directory. I think the reason maybe that i didn't reformat the namenode after

RE: ClodSuite - Data analytics: Mahout installation and dead link for data set

2012-05-17 Thread Djordje Jevdjic
Dear Hasan, The file was removed from that link 5 days ago. Than you for pointing that out. I corrected the link so you can try again. Regarding the other error: you have to install and deploy Mahout completely and without any errors. Please tell me which version of Maven an which version of JD

RE: Data Analytics: multithreaded vs multiprocess

2012-07-11 Thread Djordje Jevdjic
Dear Jayneel, The way the workload is set up corresponds to a typical use of the Hadoop Map-Reduce framework, which means that each map task is a separate process. The map task itself can be multithreaded though, but we use Mahout's version of the classification algorithm, which is single-thre

RE: [cloudsuite] Errors installing Cloud Suite

2012-08-10 Thread Djordje Jevdjic
Dear Marcos, Thanks for your interest in CloudSuite and welcome to our mailing list. Regarding the Analyitics benchmark, I can see that you made some small mistakes.In your first example, you didn't execute the whole command: $MAHOUT_HOME/bin/mahout wikipediaXMLSplitter -d $MAHOUT_HOME/example

RE: [cloudsuite Errors running Data Analytics Benchmark

2013-01-19 Thread Djordje Jevdjic
Dear Mandanna, It seems that you are not using the correct version of Hadoop. Please use version 0.20.2 as indicated, not 0.22.0. The version you are currently using doesn't have "ProgramDriver" and several other classes needed to run this benchmark (due to the changes in the API). The easiest

RE: [cloudsuite] Error running Data analytics benchmark

2013-01-21 Thread Djordje Jevdjic
Dear Kiran, It seems that you are not using the correct version of Mahout. Please use version 0.6 as indicated. The easiest way to ensure that you use the correct version of the prerequisite packages is to download the whole benchmark from the CloudSuite website. I believe this will solve your p

RE: Question about data analytic

2013-03-22 Thread Djordje Jevdjic
Dear Jinchun, The warning message that you get is irrelevant. The problem seems to be in the amount of memory that is given to the map-reduce tasks. You need to increase the heap size (e.g., run -Xmx 2048M) and make sure that you have enough DRAM for the heap size you indicate. To change the hea

RE: Question about data analytic

2013-03-24 Thread Djordje Jevdjic
: Jinchun Kim [cien...@gmail.com] Sent: Friday, March 22, 2013 3:04 PM To: Djordje Jevdjic Cc: cloudsuite@listes.epfl.ch Subject: Re: Question about data analytic Thanks Djordje :) I was able to prepare the input data file and now I'm trying to create category-based splits of Wikipedia da

RE: Question about data analytic

2013-03-24 Thread Djordje Jevdjic
heap. Regards, Djordje From: Jinchun Kim [cien...@gmail.com] Sent: Monday, March 25, 2013 12:56 AM To: Djordje Jevdjic Cc: cloudsuite@listes.epfl.ch Subject: Re: Question about data analytic Thanks Djordje. The heap size indicated in mapred-site.xml is set to -Xmx

RE: Segmentation Fault while running Data Caching

2013-07-01 Thread Djordje Jevdjic
Dear Tri, Thanks for pointing this out. Please use the updated instructions from the web (the scaling factor in the first command is also updated). Regards, Djordje From: Tri M. Nguyen [t...@princeton.edu] Sent: Monday, July 01, 2013 9:22 PM To: cloudsu

RE: [cloudsuite] Maximum throughput in data caching?

2013-07-10 Thread Djordje Jevdjic
Hello Binh, The column requests tells you how many requests were served during the last statistics interval (1s in your case, because of -T 1). The actual throughput is the second column, rps (requests per second). The command you ran is used to estimate the maximum throuhput (rps) you can ac

RE: v2.0 changelog

2013-07-21 Thread Djordje Jevdjic
Dear Mahmood, CloudSuite 2.0 introduces two new benchmarks: DataCaching and Graph Analytics. Regarding CloudSuite 1.0 benchmarks, we are currently upgrading the software packages and updating the benchmarks. Once we are done, we will post the change log on the website. Regards, Djordje __

RE: Crash while initializing sys-uarch

2013-08-04 Thread Djordje Jevdjic
Dear Yarong, This error usually implies that you have misconfigured the simulator in the wiring file. In other words, your wiring.cpp file is not consistent with the config file. Highly likely it is related to the number of memory controllers (unless you added your own components that you did

RE: [cloudsuite] memcached client issue

2013-08-09 Thread Djordje Jevdjic
Hi Xiao, I didn't understand what the problem was, but here are some hints: 1. Your server.txt file must contain the correct information about the server(s) you want to work with. 2. You run this for a second only (-t 1), and it will exit immediately, as it did. 3. I see that you are using obj

RE: running data-caching magic number error

2013-08-27 Thread Djordje Jevdjic
Hello Kazi, There is no need to be a root when running this benchmark. Could you let me know the exact Memcached version you are using? I guess you wanted to say "-S 30" in your command, rather than -S 3. I see that you are using two servers at the same time. I suggest you create two files: s

RE: running data-caching magic number error

2013-08-27 Thread Djordje Jevdjic
if the segmentation fault is still there, please run the client with gdb and send us the stack trace once it crashes. Regards, Djordje From: Kazi Sudipto Arif [sudipto.a...@gmail.com] Sent: Tuesday, August 27, 2013 5:54 PM To: Djordje Jevdjic Subject: Re

RE: [cloudsuite] setup question

2013-09-05 Thread Djordje Jevdjic
Dear Reza, To set up the Data Analytics benchmark you will need around 100GB of free disk space on one machine. You can remove the temporary files once you are done with the set-up phase. To run the benchmark on several machines, you will need less than 10GB of disk space per machine. In an

RE: [cloudsuite] data analytics

2013-10-29 Thread Djordje Jevdjic
Hello, Seems that you ran out of memory and your system is doing garbage collection all the time. You should try adjusting the number of concurrent map processes and/or the amount of memory allocated to each process. Regards, Djordje From: Wu, Jie Ying

RE: [datacaching] Bad command line option for memcached

2013-11-14 Thread Djordje Jevdjic
Hi Marco, Thanks for your e-mail. Indeed, the command line has an error. "-D" is used to configure the memory of the client when during warmup. On the server side, you should use "-M". We will fix the documentation. Thanks again for pointing this out. Regards, Djordje

RE: [datacaching] Bad command line option for memcached

2013-11-14 Thread Djordje Jevdjic
To: cloudsuite@listes.epfl.ch Subject: Re: [datacaching] Bad command line option for memcached On 11/14/2013 10:44 AM, Djordje Jevdjic wrote: > Hi Marco, > > Thanks for your e-mail. Hi Djordje, Thank you for the reply. > Indeed, the command line has an error. "-D" is used to conf

RE: [datacaching] Max throughput rps

2013-11-16 Thread Djordje Jevdjic
Hi Marco, That command is used to quickly estimate the maximum throughput a server can achieve, just to give you a hint for tuning. No need to run it for a whole day. It is not important that the number is very precise. Pick any. You need to play with the load on the client (using "-r"), incre

RE: Error while running the server in datacaching

2013-12-24 Thread Djordje Jevdjic
Hello Sneha, Nothing is supposed to be displayed. The server is running from the moment you enter the command. You may even want to run it in the background, by adding "&" at the end of the line. Regards, Djordje From: Sneha Sathyanarayana [sneha.am...@g

Rigorous and Practical Server Design Evaluation Tutorial

2014-01-12 Thread Djordje Jevdjic
Dear all, We are happy to announce that we will hold an interactive tutorial at ASPLOS in which you can learn about CloudSuite and the Flexus simulation infrastructure. More importantly, you will have the opportunity to learn how to correctly and rigorously evaluate server designs using real-wo

RE: Hadoop fails with 16+ mappers

2014-01-15 Thread Djordje Jevdjic
Hello Ahmad, I took a look at your config files, they seem correct. The limitation of 14 mappers and 2 reducers is weird, suggests that you can't utilize more than 32GB for whatever reason. I've been able to run with more processes on a weaker machine. Have you ever been able to utilize more th

RE: Clarification on Data Caching

2014-01-28 Thread Djordje Jevdjic
Dear Suhasini, Naggle's algorithm is not related to the benchmark. It’s a TCP/IP optimization that does not work well for this benchmark due to the size of the packets that are transmitted between the client and the server. So, the default (and in this case the best) option is to turn it off.

RE: [cloudsuite] questions about Data Caching of CloudSuite

2014-02-05 Thread Djordje Jevdjic
Hello Kun, Here is the legend for the output: timediff - the measurement period T (1s in your case) rps - requests per second during the last T requests - total number of requests completed within last the last T (if T=1s, equals to rps) gets - number of completed get requests during the last T

Rigorous and Practical Server Design evaluation - Tutorial at ASPLOS'14

2014-02-05 Thread Djordje Jevdjic
Dear all, Just to remind you that the early registration deadline is for our tutorial is February 10th and we still have a few empty slots for you! The tutorial will be held in conjunction with ASPLOS'14 and you will have the opportunity to learn about CloudSuite and the Flexus simulation infr

RE: A question about twitter data set of Data Caching

2014-03-18 Thread Djordje Jevdjic
Dear Chao, The input file combines both the object popularity and the object size distribution. That’s why the sizes are not sorted and some sizes may even repeat. Regards, Djordje From: Roy Lee [roy.q@gmail.com] Sent: Thursday, March 06, 2014 8:16

RE: questions about the cloudsuite-data caching

2014-07-11 Thread Djordje Jevdjic
Hello Wei, If the 90th value of, for example, 2ms, means that 90% of the requests experience a latency of up to 2ms. The same applies for 95th percentile.The target latency you want to achieve depend on your frontend application, but it’s typically 5-10ms. Regarding the throughput, you first

RE: Is the memcached client capable of batching ?

2014-08-28 Thread Djordje Jevdjic
Dear Hamza, Thanks for your interest in CloudSuite. Unfortunately, the client does not support multi-get requests. We do have plans to implement that functionality (which is used in Facebook-like settings) at some point in the future. It will probably happen with the next release of CloudSuit

RE: Flexus plus Cloudsuite image simulation requirements

2014-10-10 Thread Djordje Jevdjic
Dear Hang Lu, The hardware requirements for Flexus jobs depend on the type of jobs you are running and the workload. If you are running timing simulations with sampling, a single job requires no more than 1GB of RAM per job and one core or hardware thread (if hyperthreading should be enabled fo

RE: Building datacache load tester on SPARC running Solaris

2014-11-06 Thread Djordje Jevdjic
, Djordje From: kishore kumar [kishoregupt...@gmail.com] Sent: Tuesday, November 04, 2014 7:49 PM To: Djordje Jevdjic Cc: cloudsuite@listes.epfl.ch Subject: Building datacache load tester on SPARC running Solaris Hi, First of all, thank you very much for making

RE: CloudStone guide

2014-12-02 Thread Djordje Jevdjic
From: moslem mosadegh [m.mosadeg...@gmail.com] Sent: Tuesday, December 02, 2014 2:24 PM To: Djordje Jevdjic Subject: CloudStone guide Hi. I'm a master student of computer and working on cloudstone benchmark. I serached in the net and found out your group "parsa&q