Re: Kudu datastore reports
Hi Alfonso. Thanks so much for your feedback. I am working on your comments. Best, John El lun., 10 jun. 2019 a las 16:11, Alfonso Nishikawa (< alfonso.nishik...@gmail.com>) escribió: > Hi, John. > > Regarding your questions at the report [1]: > > >- How to represent partitioning configurations on the mapping file. > > This was discussed in other emails, isn't it? :) > >- KuduTestHarness requires the Maven plugin os-maven-plugin, which >needs Maven 3.1.1+, is it a problem for Apache Gora? > > I believe it is not a problem. My Ubuntu comes with 3.6.0, far from 3.1.1, > and I assume everyone uses Maven 3 in a quite new version :) > > [1] - > https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports > > > Regards, > > Alfonso Nishikawa > > > El lun., 10 jun. 2019 a las 21:07, Alfonso Nishikawa (< > alfonso.nishik...@gmail.com>) escribió: > >> Hi, John. >> >> Thank you! >> Things I have seen: >> >> - The version of a maven dependency [1] should go on the Dependency >> Management of the root pom [2]. Same for [3] and from there, should not set >> the version there. >> - Set test dependencies' scope to test, at [4] and from there. >> - Set the indentation to 2 spaces for the pom [5] >> - Missing "t" in "localhost" at [6]. >> - Port 13 for Kudu? That is "Daytime Protocol" RFC 867 and you will need >> root permission to run it. The default port for kudu is 7051, isn't it? >> - I would ask you to add the same functionality to load the mapping from >> configuration as in HBase's store [7] in you KuduStore [8]. This will have >> implications on your readMapping at [9], so take a look at the one for >> HBase at [10] >> - I know it is in other backends, but avoid RuntimeExceptions (at least >> in Java since we have the checked ones) like in [11]. You can wrap them in >> GoraException. An example is [12] >> >> And nothing more :) >> Keep going, good job. >> >> >> [1] - >> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98 >> [2] - https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890 >> [3] - >> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121 >> [4] - >> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180 >> [5] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml >> [6] - >> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18 >> [7] - >> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92 >> [8] - >> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53 >> [9] - >> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81 >> [10] - >> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822 >> [11] - >> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141 >> [12] - >> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268 >> >> Regards, >> >> Alfonso Nishikawa >> >> >> El sáb., 8 jun. 2019 a las 20:26, John Mora () >> escribió: >> >>> Hi all. >>> >>> I have just updated my weekly reports on Cwiki [1]. This next week I >>> think I should be focusing on the create schema operation and solving the >>> issue of the partitioning configurations in the mapping file. >>> >>> Please let me know if you have suggestions, my last commits are >>> available here [2] >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>> >>> Best, >>> John >>> >>>
Re: Redis datastore
Thanks Alonso for your comments. I will analyze the code from HBase. Thanks for your suggestion… Also, my code is in master, but I will move to the branch 527, it will be easier to trace changes. Best, Xavier. > On Jun 10, 2019, at 18:20, Alfonso Nishikawa > wrote: > > Hi, Xavier. > > I don't see your branch GORA-527 told in your report. > What I would want to ask for is to add the same functionality to load the > mapping from configuration in you RedisStore [1] as in HBase's store [2]. > This will have implications on your readMapping that in HBase was done by > passing an InputStream instead of a file name. > > [1] - > https://github.com/cuent/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L78 > [2] - > https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92 > > Regards, > > Alfonso Nishikawa > > El lun., 10 jun. 2019 a las 4:16, FRANCISCO XAVIER SUMBA TORAL > () escribió: > >> Thanks for your comments. >> >> I pushed my commits and updated the report. >> >> During this week I want to decide what redisson codec works best for >> gora’s use case and I will also solve the comments of the libraries >> comparison. I will keep updating you guys during the week any blocker or >> new tasks that come along. >> >> >> Best, >> Xavier. >> >> >>> On Jun 9, 2019, at 23:43, carlos muñoz wrote: >>> >>> Hi Xavier >>> >>> The document looks great. I have left a few comments. I would like to >> read >>> other valorations . >>> Also, please make sure to publish your weekly process on the Wiki space. >>> >>> Regards, >>> Carlos >>> >>> El vie., 7 jun. 2019 a las 23:01, FRANCISCO XAVIER SUMBA TORAL >>> () escribió: >>> Sorry, try again. Anybody should be able to access? I set for everyone. >> https://docs.google.com/document/d/17RlGIu_SaPo7O2J7k_htg1UDDO9ah41u8cCYUURC7BM/edit?usp=sharing Best, Xavier > On Jun 7, 2019, at 23:59, Kevin Ratnasekera wrote: > > Hi Xavier, > > I requested access to the docs. Can you please give permissions? > > Regards > Kevin > > On Sat, Jun 8, 2019 at 9:23 AM FRANCISCO XAVIER SUMBA TORAL > wrote: > >> Hello >> >> I think that redisson is the option to support redis in gora. >> >> There is an analysis here [1]. What do you think? After some suggestions I >> can add those results to the wiki. >> >> Best >> Xavier >> >> [1] >> >> >> https://docs.google.com/document/d/17RlGIu_SaPo7O2J7k_htg1UDDO9ah41u8cCYUURC7BM/edit?usp=drivesdk >> >> On Wed, Jun 5, 2019, 9:16 AM FRANCISCO XAVIER SUMBA TORAL, < >> xavier.sumb...@ucuenca.edu.ec> wrote: >> >>> >>> >>> On Wed, Jun 5, 2019, 1:31 AM Kevin Ratnasekera, < djkevincr1...@gmail.com >>> >>> wrote: >>> Hi Xavier, Thank you for the update. Take your time on the research for >> selecting >> the Redis client library, you dont need to compare all Redis clients, take a subset/few looks most promising ( by comparing community, functionality etc ). There are only very few recommended from redis.io. [1] ( Jedis, Lettuce and Redisson ) Let's focus on these 3, about high lever data structures etc. Let s do a comparison once you complete that research work. >>> >>> Okay, I will work on that comparison. >>> >>> >>> >>> As Carlos mentioned, If you do have troubles setting up embedded server, you could always use [1] to spin up Redis server instance from >> docker image. This is the same approach we have taken on Aerospike and CouchDB datastore tests. That way you can spin up a real instance of Redis >> server, you wont be having any limitations compared to these mock servers. >>> >>> Thanks I am looking into aerospike implementation. >>> >>> >>> [1] https://redis.io/clients#java [2] https://www.testcontainers.org/ Regards Best , >>> >>> Xavier >>> >> >> -- >> Advertencia legal: >> Este mensaje y, en su caso, los archivos anexos son >> confidenciales, especialmente en lo que respecta a los datos personales, y >> se dirigen exclusivamente al destinatario referenciado. Si usted no lo es >> y >> lo ha recibido por error o tiene conocimiento del mismo por cualquier >> motivo, le rogamos que nos lo comunique por este medio y proceda a >> destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, >> reproducir, alterar, archivar o comunicar a terceros el presente mensaje y >> ficheros anexos, todo ello bajo pena de incurrir en
Re: Week 2 Report and A Question
Hello Alfonso and Renato, Thank you for getting in touch and thanks for the detailed replies. I will have proper look at this tomorrow morning. I did some troubleshooting yesterday (mostly playing with Xmx and zookeeper timeout settings), that improved the conditions, but it did not entirely solve the problem. Preliminary, it seems the problem has to do with configuration or how HBaseStore is implemented (this may not be entirely true). I will keep you all posted whenever I thoroughly have a look at your suggestions. Thanks again. **Sheriffo Ceesay** On Mon, Jun 10, 2019 at 11:14 PM Alfonso Nishikawa < alfonso.nishik...@gmail.com> wrote: > Hi! > > My hypothesis is taht that the difference between MongoDB and HBase is that > HBase put more stress serializing with avro. It could affect too that if > the HBase's test is performed after MongoDB's ones, then the GC starts from > a "bad" situation. > > From [A] linked by @Renato, if the error was OutOfMemoryException I would > have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or > even 1, but with a GC error I am not that much sure. In anycase, @Sheriffo: > you can try this if with the optimizations still doesn't work :) > > @Renato: Thx for the links! > > Regards, > > Alfonso Nishikawa > > > > El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (< > renatoj.marroq...@gmail.com>) escribió: > > > @Alfonso, > > Thank you very much for the suggestions! you are totally right about > > all of your points! Sheriffo, please benefit from them ;) > > > > Also what is strange is this (although it can be optimized as Alfonso > > pointed out) is that it works for the MongoDB backend. So I would also > > suspect on the configuration of the Gora-HBase client. Have you taken > > a look at [A] for example? or other Gora-HBase assumed configurations > > [B]? Maybe there you can specify some Xmx / Xms config. > > > > > > Best, > > > > Renato M. > > > > [A] > > > https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties > > [B] > > > https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml > > > > El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa > > () escribió: > > > > > > Hi again, Sheriffo. > > > > > > More improvements to [1] over the last email: > > > > > > - fields.toArray() doesn't need a full array like in [6]. You should do > > > just fields.toArray(new String[0]), and better if you create an array > [0] > > > and reuse it. That call only needs the type. > > > - I guess the class at [2] will always be the same, so you don't need > to > > > set it on every insert call. > > > - The string concatenation is overkilling for the jvm on the 1M calls > * N > > > fields at [3] and same for [4]. Precalculate the names in a list or > array > > > and reuse then for the 1M*N calls. > > > - Other optimization for [3] is, given that PersistentBase [5] exctends > > > SpecificRecordBase, you can access the fields by index with > > > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object). > > > > > > [1] - > > > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127 > > > [2] - > > > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134 > > > [3] - > > > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136 > > > [4] - > > > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139 > > > [5] - > > > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3 > > > [6] - > > > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163 > > > > > > Let's see if with that optimizations we free the jvm memory management > > from > > > much stress. > > > > > > Regards, > > > > > > Alfonso Nishikawa > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (< > > > alfonso.nishik...@gmail.com>) escribió: > > > > > > > Hi, Sheriffo. > > > > > > > > You can try reusing the Persistent instances [1] to insert the data. > I > > > > don't know all the backends, but they should be reusable, at least in > > > > mongoDB and HBase. > > > > > > > > [1] - > > > > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130 > > > > > > > > Regards, > > > > > > > > Alfonso Nishikawa > > > > > > > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (< > > > > alfonso.nishik...@gmail.com>) escribió: > > > > > > > >> Hi, Sheriffo. > > > >> > > > >> I really don't
Re: Redis datastore
Hi, Xavier. I don't see your branch GORA-527 told in your report. What I would want to ask for is to add the same functionality to load the mapping from configuration in you RedisStore [1] as in HBase's store [2]. This will have implications on your readMapping that in HBase was done by passing an InputStream instead of a file name. [1] - https://github.com/cuent/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L78 [2] - https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92 Regards, Alfonso Nishikawa El lun., 10 jun. 2019 a las 4:16, FRANCISCO XAVIER SUMBA TORAL () escribió: > Thanks for your comments. > > I pushed my commits and updated the report. > > During this week I want to decide what redisson codec works best for > gora’s use case and I will also solve the comments of the libraries > comparison. I will keep updating you guys during the week any blocker or > new tasks that come along. > > > Best, > Xavier. > > > > On Jun 9, 2019, at 23:43, carlos muñoz wrote: > > > > Hi Xavier > > > > The document looks great. I have left a few comments. I would like to > read > > other valorations . > > Also, please make sure to publish your weekly process on the Wiki space. > > > > Regards, > > Carlos > > > > El vie., 7 jun. 2019 a las 23:01, FRANCISCO XAVIER SUMBA TORAL > > () escribió: > > > >> Sorry, try again. Anybody should be able to access? I set for everyone. > >> > >> > >> > https://docs.google.com/document/d/17RlGIu_SaPo7O2J7k_htg1UDDO9ah41u8cCYUURC7BM/edit?usp=sharing > >> > >> > >> Best, > >> Xavier > >>> On Jun 7, 2019, at 23:59, Kevin Ratnasekera > >> wrote: > >>> > >>> Hi Xavier, > >>> > >>> I requested access to the docs. Can you please give permissions? > >>> > >>> Regards > >>> Kevin > >>> > >>> On Sat, Jun 8, 2019 at 9:23 AM FRANCISCO XAVIER SUMBA TORAL > >>> wrote: > >>> > Hello > > I think that redisson is the option to support redis in gora. > > There is an analysis here [1]. What do you think? After some > >> suggestions I > can add those results to the wiki. > > Best > Xavier > > [1] > > > >> > https://docs.google.com/document/d/17RlGIu_SaPo7O2J7k_htg1UDDO9ah41u8cCYUURC7BM/edit?usp=drivesdk > > On Wed, Jun 5, 2019, 9:16 AM FRANCISCO XAVIER SUMBA TORAL, < > xavier.sumb...@ucuenca.edu.ec> wrote: > > > > > > > On Wed, Jun 5, 2019, 1:31 AM Kevin Ratnasekera, < > >> djkevincr1...@gmail.com > > > > wrote: > > > >> Hi Xavier, > >> > >> Thank you for the update. Take your time on the research for > selecting > the > >> Redis client library, you dont need to compare all Redis clients, > >> take a > >> subset/few looks most promising ( by comparing community, > >> functionality > >> etc > >> ). There are only very few recommended from redis.io. [1] ( Jedis, > >> Lettuce > >> and Redisson ) Let's focus on these 3, about high lever data > >> structures > >> etc. Let s do a comparison once you complete that research work. > >> > > > > Okay, I will work on that comparison. > > > > > > > > > >> As Carlos mentioned, If you do have troubles setting up embedded > >> server, > >> you could always use [1] to spin up Redis server instance from > docker > >> image. This is the same approach we have taken on Aerospike and > >> CouchDB > >> datastore tests. That way you can spin up a real instance of Redis > server, > >> you wont be having any limitations compared to these mock servers. > >> > > > > Thanks I am looking into aerospike implementation. > > > > > > > >> > >> [1] https://redis.io/clients#java > >> [2] https://www.testcontainers.org/ > >> > >> Regards > >> > >> > >> Best , > > > > Xavier > > > > -- > Advertencia legal: > Este mensaje y, en su caso, los archivos anexos son > confidenciales, especialmente en lo que respecta a los datos > >> personales, y > se dirigen exclusivamente al destinatario referenciado. Si usted no lo > >> es > y > lo ha recibido por error o tiene conocimiento del mismo por cualquier > motivo, le rogamos que nos lo comunique por este medio y proceda a > destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, > reproducir, alterar, archivar o comunicar a terceros el presente > >> mensaje y > ficheros anexos, todo ello bajo pena de incurrir en responsabilidades > legales. Las opiniones contenidas en este mensaje y en los archivos > adjuntos, pertenecen exclusivamente a su remitente y no representan la > opinión de la Universidad de Cuenca salvo que se diga expresamente y > el > remitente esté autorizado para ello. El emisor no garantiza la > >> integridad, > rapidez o seguridad del presente correo, ni se
Re: Week 2 Report and A Question
Hi! My hypothesis is taht that the difference between MongoDB and HBase is that HBase put more stress serializing with avro. It could affect too that if the HBase's test is performed after MongoDB's ones, then the GC starts from a "bad" situation. >From [A] linked by @Renato, if the error was OutOfMemoryException I would have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or even 1, but with a GC error I am not that much sure. In anycase, @Sheriffo: you can try this if with the optimizations still doesn't work :) @Renato: Thx for the links! Regards, Alfonso Nishikawa El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (< renatoj.marroq...@gmail.com>) escribió: > @Alfonso, > Thank you very much for the suggestions! you are totally right about > all of your points! Sheriffo, please benefit from them ;) > > Also what is strange is this (although it can be optimized as Alfonso > pointed out) is that it works for the MongoDB backend. So I would also > suspect on the configuration of the Gora-HBase client. Have you taken > a look at [A] for example? or other Gora-HBase assumed configurations > [B]? Maybe there you can specify some Xmx / Xms config. > > > Best, > > Renato M. > > [A] > https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties > [B] > https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml > > El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa > () escribió: > > > > Hi again, Sheriffo. > > > > More improvements to [1] over the last email: > > > > - fields.toArray() doesn't need a full array like in [6]. You should do > > just fields.toArray(new String[0]), and better if you create an array [0] > > and reuse it. That call only needs the type. > > - I guess the class at [2] will always be the same, so you don't need to > > set it on every insert call. > > - The string concatenation is overkilling for the jvm on the 1M calls * N > > fields at [3] and same for [4]. Precalculate the names in a list or array > > and reuse then for the 1M*N calls. > > - Other optimization for [3] is, given that PersistentBase [5] exctends > > SpecificRecordBase, you can access the fields by index with > > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object). > > > > [1] - > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127 > > [2] - > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134 > > [3] - > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136 > > [4] - > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139 > > [5] - > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3 > > [6] - > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163 > > > > Let's see if with that optimizations we free the jvm memory management > from > > much stress. > > > > Regards, > > > > Alfonso Nishikawa > > > > > > > > > > > > > > > > > > > > > > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (< > > alfonso.nishik...@gmail.com>) escribió: > > > > > Hi, Sheriffo. > > > > > > You can try reusing the Persistent instances [1] to insert the data. I > > > don't know all the backends, but they should be reusable, at least in > > > mongoDB and HBase. > > > > > > [1] - > > > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130 > > > > > > Regards, > > > > > > Alfonso Nishikawa > > > > > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (< > > > alfonso.nishik...@gmail.com>) escribió: > > > > > >> Hi, Sheriffo. > > >> > > >> I really don't know how to solve it, but are you setting any Xmx / Xms > > >> configuration values? > > >> > > >> Regards, > > >> > > >> Alfonso NIshikawa > > >> > > >> > > >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (< > sneceesa...@gmail.com>) > > >> escribió: > > >> > > >>> Hi All, > > >>> > > >>> Week 2 progress update is available at > > >>> > > >>> > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > >>> > > >>> I have one question that I would like my mentors to advise on, I am > still > > >>> working it but thought it would be good to report it because it is > HBase > > >>> specific. > > >>> > > >>> So the problem has to do with an OutOfMemory error when inserting 1M > + > > >>> record in HBase. This happens when I try to run the actual > benchmark by > > >>> first loading HBase with 1 million plus records. It works perfectly > for > > >>>
Re: Week 2 Report and A Question
@Alfonso, Thank you very much for the suggestions! you are totally right about all of your points! Sheriffo, please benefit from them ;) Also what is strange is this (although it can be optimized as Alfonso pointed out) is that it works for the MongoDB backend. So I would also suspect on the configuration of the Gora-HBase client. Have you taken a look at [A] for example? or other Gora-HBase assumed configurations [B]? Maybe there you can specify some Xmx / Xms config. Best, Renato M. [A] https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties [B] https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa () escribió: > > Hi again, Sheriffo. > > More improvements to [1] over the last email: > > - fields.toArray() doesn't need a full array like in [6]. You should do > just fields.toArray(new String[0]), and better if you create an array [0] > and reuse it. That call only needs the type. > - I guess the class at [2] will always be the same, so you don't need to > set it on every insert call. > - The string concatenation is overkilling for the jvm on the 1M calls * N > fields at [3] and same for [4]. Precalculate the names in a list or array > and reuse then for the 1M*N calls. > - Other optimization for [3] is, given that PersistentBase [5] exctends > SpecificRecordBase, you can access the fields by index with > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object). > > [1] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127 > [2] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134 > [3] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136 > [4] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139 > [5] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3 > [6] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163 > > Let's see if with that optimizations we free the jvm memory management from > much stress. > > Regards, > > Alfonso Nishikawa > > > > > > > > > > > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (< > alfonso.nishik...@gmail.com>) escribió: > > > Hi, Sheriffo. > > > > You can try reusing the Persistent instances [1] to insert the data. I > > don't know all the backends, but they should be reusable, at least in > > mongoDB and HBase. > > > > [1] - > > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130 > > > > Regards, > > > > Alfonso Nishikawa > > > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (< > > alfonso.nishik...@gmail.com>) escribió: > > > >> Hi, Sheriffo. > >> > >> I really don't know how to solve it, but are you setting any Xmx / Xms > >> configuration values? > >> > >> Regards, > >> > >> Alfonso NIshikawa > >> > >> > >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay () > >> escribió: > >> > >>> Hi All, > >>> > >>> Week 2 progress update is available at > >>> > >>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > >>> > >>> I have one question that I would like my mentors to advise on, I am still > >>> working it but thought it would be good to report it because it is HBase > >>> specific. > >>> > >>> So the problem has to do with an OutOfMemory error when inserting 1M + > >>> record in HBase. This happens when I try to run the actual benchmark by > >>> first loading HBase with 1 million plus records. It works perfectly for > >>> MongoDB but not HBase > >>> > >>> So I am assuming this problem is specific to HBase. The stack trace is > >>> given below. > >>> > >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead > >>> limit exceeded > >>> > >>> > >>> > >>> at > >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) > >>> > >>> > >>> > >>> at java.lang.StringCoding.encode(StringCoding.java:344) > >>> > >>> > >>> > >>> > >>> at java.lang.String.getBytes(String.java:918) > >>> > >>> > >>> > >>> > >>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733) > >>> > >>> > >>> > >>> > >>> at > >>> > >>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225) > >>> > >>> > >>> > >>> at > >>> > >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383) > >>> > >>> > >>> > >>> at > >>> > >>>
Re: Week 2 Report and A Question
Hi again, Sheriffo. More improvements to [1] over the last email: - fields.toArray() doesn't need a full array like in [6]. You should do just fields.toArray(new String[0]), and better if you create an array [0] and reuse it. That call only needs the type. - I guess the class at [2] will always be the same, so you don't need to set it on every insert call. - The string concatenation is overkilling for the jvm on the 1M calls * N fields at [3] and same for [4]. Precalculate the names in a list or array and reuse then for the 1M*N calls. - Other optimization for [3] is, given that PersistentBase [5] exctends SpecificRecordBase, you can access the fields by index with SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object). [1] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127 [2] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134 [3] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136 [4] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139 [5] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3 [6] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163 Let's see if with that optimizations we free the jvm memory management from much stress. Regards, Alfonso Nishikawa El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (< alfonso.nishik...@gmail.com>) escribió: > Hi, Sheriffo. > > You can try reusing the Persistent instances [1] to insert the data. I > don't know all the backends, but they should be reusable, at least in > mongoDB and HBase. > > [1] - > https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130 > > Regards, > > Alfonso Nishikawa > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (< > alfonso.nishik...@gmail.com>) escribió: > >> Hi, Sheriffo. >> >> I really don't know how to solve it, but are you setting any Xmx / Xms >> configuration values? >> >> Regards, >> >> Alfonso NIshikawa >> >> >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay () >> escribió: >> >>> Hi All, >>> >>> Week 2 progress update is available at >>> >>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report >>> >>> I have one question that I would like my mentors to advise on, I am still >>> working it but thought it would be good to report it because it is HBase >>> specific. >>> >>> So the problem has to do with an OutOfMemory error when inserting 1M + >>> record in HBase. This happens when I try to run the actual benchmark by >>> first loading HBase with 1 million plus records. It works perfectly for >>> MongoDB but not HBase >>> >>> So I am assuming this problem is specific to HBase. The stack trace is >>> given below. >>> >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead >>> limit exceeded >>> >>> >>> >>> at >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) >>> >>> >>> >>> at java.lang.StringCoding.encode(StringCoding.java:344) >>> >>> >>> >>> >>> at java.lang.String.getBytes(String.java:918) >>> >>> >>> >>> >>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733) >>> >>> >>> >>> >>> at >>> >>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225) >>> >>> >>> >>> at >>> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383) >>> >>> >>> >>> at >>> >>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348) >>> >>> >>> >>> at >>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319) >>> >>> >>> >>> >>> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84) >>> >>> >>> >>> >>> at >>> >>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141) >>> >>> >>> >>> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148) >>> >>> >>> >>> >>> at >>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461) >>> >>> >>> >>> at com.yahoo.ycsb.ClientThread.run(Client.java:269) >>> >>> The insert implementation of the module available at >>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark in >>> GoraBenchmarkClient.java is very straight forward. I have had a brief >>> look >>> at HBaseStore.java put() implementation but could not find an issue with >>> that. >>> >>> If I solve this problem, then I will do run more workloads to verify that >>> the module is stable for
Re: Week 2 Report and A Question
Hi, Sheriffo. You can try reusing the Persistent instances [1] to insert the data. I don't know all the backends, but they should be reusable, at least in mongoDB and HBase. [1] - https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130 Regards, Alfonso Nishikawa El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (< alfonso.nishik...@gmail.com>) escribió: > Hi, Sheriffo. > > I really don't know how to solve it, but are you setting any Xmx / Xms > configuration values? > > Regards, > > Alfonso NIshikawa > > > El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay () > escribió: > >> Hi All, >> >> Week 2 progress update is available at >> >> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report >> >> I have one question that I would like my mentors to advise on, I am still >> working it but thought it would be good to report it because it is HBase >> specific. >> >> So the problem has to do with an OutOfMemory error when inserting 1M + >> record in HBase. This happens when I try to run the actual benchmark by >> first loading HBase with 1 million plus records. It works perfectly for >> MongoDB but not HBase >> >> So I am assuming this problem is specific to HBase. The stack trace is >> given below. >> >> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead >> limit exceeded >> >> >> >> at >> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) >> >> >> >> at java.lang.StringCoding.encode(StringCoding.java:344) >> >> >> >> >> at java.lang.String.getBytes(String.java:918) >> >> >> >> >> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733) >> >> >> >> >> at >> >> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225) >> >> >> >> at >> >> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383) >> >> >> >> at >> >> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348) >> >> >> >> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319) >> >> >> >> >> at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84) >> >> >> >> >> at >> >> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141) >> >> >> >> at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148) >> >> >> >> >> at >> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461) >> >> >> >> at com.yahoo.ycsb.ClientThread.run(Client.java:269) >> >> The insert implementation of the module available at >> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark in >> GoraBenchmarkClient.java is very straight forward. I have had a brief look >> at HBaseStore.java put() implementation but could not find an issue with >> that. >> >> If I solve this problem, then I will do run more workloads to verify that >> the module is stable for the basic implementation. Then I will go ahead >> and >> work on suggestions made by Renato last week. >> >> Please let me know what your thoughts are. >> >> >> Thank you. >> >> >> >> **Sheriffo Ceesay** >> >
Re: Week 2 Report and A Question
Hi, Sheriffo. I really don't know how to solve it, but are you setting any Xmx / Xms configuration values? Regards, Alfonso NIshikawa El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay () escribió: > Hi All, > > Week 2 progress update is available at > > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > I have one question that I would like my mentors to advise on, I am still > working it but thought it would be good to report it because it is HBase > specific. > > So the problem has to do with an OutOfMemory error when inserting 1M + > record in HBase. This happens when I try to run the actual benchmark by > first loading HBase with 1 million plus records. It works perfectly for > MongoDB but not HBase > > So I am assuming this problem is specific to HBase. The stack trace is > given below. > > Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead > limit exceeded > > > > at > java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) > > > > at java.lang.StringCoding.encode(StringCoding.java:344) > > > > > at java.lang.String.getBytes(String.java:918) > > > > > at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733) > > > > > at > > org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225) > > > > at > > org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383) > > > > at > > org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348) > > > > at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319) > > > > > at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84) > > > > > at > > org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141) > > > > at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148) > > > > > at > com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461) > > > > at com.yahoo.ycsb.ClientThread.run(Client.java:269) > > The insert implementation of the module available at > https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark in > GoraBenchmarkClient.java is very straight forward. I have had a brief look > at HBaseStore.java put() implementation but could not find an issue with > that. > > If I solve this problem, then I will do run more workloads to verify that > the module is stable for the basic implementation. Then I will go ahead and > work on suggestions made by Renato last week. > > Please let me know what your thoughts are. > > > Thank you. > > > > **Sheriffo Ceesay** >
Re: Kudu datastore reports
Hi, John. Regarding your questions at the report [1]: - How to represent partitioning configurations on the mapping file. This was discussed in other emails, isn't it? :) - KuduTestHarness requires the Maven plugin os-maven-plugin, which needs Maven 3.1.1+, is it a problem for Apache Gora? I believe it is not a problem. My Ubuntu comes with 3.6.0, far from 3.1.1, and I assume everyone uses Maven 3 in a quite new version :) [1] - https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports Regards, Alfonso Nishikawa El lun., 10 jun. 2019 a las 21:07, Alfonso Nishikawa (< alfonso.nishik...@gmail.com>) escribió: > Hi, John. > > Thank you! > Things I have seen: > > - The version of a maven dependency [1] should go on the Dependency > Management of the root pom [2]. Same for [3] and from there, should not set > the version there. > - Set test dependencies' scope to test, at [4] and from there. > - Set the indentation to 2 spaces for the pom [5] > - Missing "t" in "localhost" at [6]. > - Port 13 for Kudu? That is "Daytime Protocol" RFC 867 and you will need > root permission to run it. The default port for kudu is 7051, isn't it? > - I would ask you to add the same functionality to load the mapping from > configuration as in HBase's store [7] in you KuduStore [8]. This will have > implications on your readMapping at [9], so take a look at the one for > HBase at [10] > - I know it is in other backends, but avoid RuntimeExceptions (at least in > Java since we have the checked ones) like in [11]. You can wrap them in > GoraException. An example is [12] > > And nothing more :) > Keep going, good job. > > > [1] - > https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98 > [2] - https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890 > [3] - > https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121 > [4] - > https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180 > [5] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml > [6] - > https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18 > [7] - > https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92 > [8] - > https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53 > [9] - > https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81 > [10] - > https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822 > [11] - > https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141 > [12] - > https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268 > > Regards, > > Alfonso Nishikawa > > > El sáb., 8 jun. 2019 a las 20:26, John Mora () > escribió: > >> Hi all. >> >> I have just updated my weekly reports on Cwiki [1]. This next week I >> think I should be focusing on the create schema operation and solving the >> issue of the partitioning configurations in the mapping file. >> >> Please let me know if you have suggestions, my last commits are available >> here [2] >> >> [1] >> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >> >> Best, >> John >> >>
Re: Kudu datastore reports
Hi, John. Thank you! Things I have seen: - The version of a maven dependency [1] should go on the Dependency Management of the root pom [2]. Same for [3] and from there, should not set the version there. - Set test dependencies' scope to test, at [4] and from there. - Set the indentation to 2 spaces for the pom [5] - Missing "t" in "localhost" at [6]. - Port 13 for Kudu? That is "Daytime Protocol" RFC 867 and you will need root permission to run it. The default port for kudu is 7051, isn't it? - I would ask you to add the same functionality to load the mapping from configuration as in HBase's store [7] in you KuduStore [8]. This will have implications on your readMapping at [9], so take a look at the one for HBase at [10] - I know it is in other backends, but avoid RuntimeExceptions (at least in Java since we have the checked ones) like in [11]. You can wrap them in GoraException. An example is [12] And nothing more :) Keep going, good job. [1] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98 [2] - https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890 [3] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121 [4] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180 [5] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml [6] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18 [7] - https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92 [8] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53 [9] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81 [10] - https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822 [11] - https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141 [12] - https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268 Regards, Alfonso Nishikawa El sáb., 8 jun. 2019 a las 20:26, John Mora () escribió: > Hi all. > > I have just updated my weekly reports on Cwiki [1]. This next week I think > I should be focusing on the create schema operation and solving the issue > of the partitioning configurations in the mapping file. > > Please let me know if you have suggestions, my last commits are available > here [2] > > [1] > https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports > [2] https://github.com/jhnmora000/gora/tree/GORA-485 > > Best, > John > >