Hi Mingda,
What Spark version did you use for your benchmark and what's the
SPARK_SLAVE_MEMORY budget ? Also what is the file format for Spark (Parquet
or CSV)?
On Wed, Dec 21, 2016 at 9:20 AM, Yingyi Bu wrote:
> Hi Mingda,
>
> I think that in your setting, a better
Mingda,
1. Can you paste the returned JSON of http://:19002/admin/cluster at your side? (Pls replace with the
actual master node name or IP)
2. Can you list the individual size of each dataset involved in the
query, e.g., catalog_returns, catalog_sales, and inventory? (I assume
100GB
BTW: For the 1-10G figure's label, 'W' means wrong order while 'R' means
right order. We always put the wrong order join left side to right order
join.
And for the 1-100G figure's, we didn't add labels but the wrong order join
is also in left side.
On Tue, Dec 20, 2016 at 5:44 PM, mingda li
Oh, sure. When we test the 100G multiple join, we find AsterixDB is slower
than Spark (but still faster than Pig and Hive).
I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We will
only use 1-10G.eps in our paper).
And thanks for Ian's advice:* The dev list generally strips
Mingda: can you check again please. I’m not seeing the plots.
-Tyson
From: mingda li [mailto:limingda1...@gmail.com]
Sent: Tuesday, December 20, 2016 4:16 PM
To: dev@asterixdb.apache.org
Cc: Michael Carey ; Tyson Condie
Subject: Re: Time of
Thanks Abdullah. The patch is already on the Jenkins:
https://asterix-gerrit.ics.uci.edu/#/c/1388/ - patch set 9. I am not
trying to touch the internal of indexing part. What I'm trying to do is
just creating an inverted index type index. :-) Anyway, thanks for the
information.
Best,
Taewoo
On