Re: [ANN] Simbase: A vector similarity database
Hi,folks, Simbase v0.1.0-beta1 just release! We had fix many bugs,and the system are very stable for almost half a year in our cases。 In the docs, we add simple scenario for your references Setup bmk b2048 t1 t2 t3 ... t2047 t2048 vmk b2048 article vmk b2048 userprofile rmk userprofile article cosinesq Fill data vadd article 1 0.11 0.112 0.1123... vadd article 2 0.21 0.212 0.2123... ... vadd userprofile 1 0.11 0.112 0.1123... vadd userprofile 2 0.21 0.212 0.2123... ... Query rrec userprofile 2 article On Sun, Jan 26, 2014 at 9:21 AM, Mingli Yuan mingli.y...@gmail.com wrote: Hi, folks, This week we released v0.1.0-alpha3 * Remove constrains on vectors, Simbase support arbitrary vectors now * Fix various bugs on memory structure to keep scale ratio linearly * Almost 7 times improvement on performance, right now it can handle 100k dimensional dense vectors in under 0.14 sec on a i7-cup mac laptop. From now on, it enter the beta phase. If it is relevant to your work, we encourage you to have a try, and help us to find more bugs. Regards, Mingli On Mon, Jan 13, 2014 at 5:55 PM, Mingli Yuan mingli.y...@gmail.com wrote: Hi, folks, We just release an alpha version of Simbase, a vector similarity database that talks redis protocol. Since it is the first version of all its releases, we decided to keep it in alpha right now, for we want to hear from the community for any comments and improvements. Github page -- https://github.com/guokr/simbase We introduce the basic idea, limitations, build process and commands there. Background -- Simbase is a tool we developed during the process we revise our content recommendation engine. Our document set have 300k docs, and we use LDA to change them into vectors. But how to compare the 300k vectors was a problem for us then. We had tried different method, but the performance is not very good. Since the comparison logic is quit simple, we decided to write a new data store to do the tricks. So far, we are satisfied by its performance. Under the setting of an i7 MacBook and 120k 1k-dimensional vector set: - write: about 1 ops per second - read: up to 1k ops per second The real read performance may be higher than the current result, because our testing method is limited. Regards, Mingli -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[ANN] clj-cn-nlp 0.2.1 released
Hi, folks, we just released clj-cn-nlp version 0.2.1, a Clojure NLP wrapper based on Stanford-CoreNLP for Simplified Chinese users. Three default Chinese language model was shipped with this wrapper to provide: - seg: Chinese word segmentation - ner: Chinese naming entity recognition - tag: Chinese POS tagging In this release, the API dose not change, but we simplified the implementation a lot, remove reflection and customised class loading in code, and in the meantime the performance should be better than the previous release. Add below dependency in your project.clj [com.guokr/clj-cn-nlp 0.2.1] And then in repl you will see (use 'com.guokr.nlp.seg) (use 'com.guokr.nlp.ner) (use 'com.guokr.nlp.tag) (seg Clojure是由里奇·希基发明的一门编程语言,它的开发者已经遍布世界,渗入到各个应用领域!) Clojure 是 由 里 奇·希基 发明 的 一 门 编程 语言 , 它 的 开发者 已经 遍布 世界 , 渗入 到 各个 应用 领域 ! (ner Clojure是由多才多艺的瑞奇·希基发明的一门编程语言,它的开发者已经遍布世界,渗入到各个应用领域!) Clojure/O 是/O 由/O 多才多艺/O 的/O 瑞奇/PERSON·/O希基/PERSON 发明/O 的/O 一/O 门/O 编程/O 语言/O ,/O 它/O 的/O 开发者/O 已经/O 遍布/O 世界/O ,/O 渗入/O 到/O 各个/O 应用/O 领域/O !/O (tag Clojure是由多才多艺的瑞奇·希基发明的一门编程语言,它的开发者已经遍布世界,渗入到各个应用领域!) Clojure#NR 是#VC 由#P 多才多艺#VV 的#DEC 瑞奇·希基#NR 发明#VV 的#DEC 一#CD 门#M 编程#NN 语言#NN ,#PU 它#PN 的#DEG 开发者#NN 已经#AD 遍布#VV 世界#NN ,#PU 渗入#VV 到#VV 各个#DT 应用#NN 领域#NN !#PU Any comments and bugfix are welcomed! Regards, Mingli -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [ANN] Simbase: A vector similarity database
Hi, folks, This week we released v0.1.0-alpha3 * Remove constrains on vectors, Simbase support arbitrary vectors now * Fix various bugs on memory structure to keep scale ratio linearly * Almost 7 times improvement on performance, right now it can handle 100k dimensional dense vectors in under 0.14 sec on a i7-cup mac laptop. From now on, it enter the beta phase. If it is relevant to your work, we encourage you to have a try, and help us to find more bugs. Regards, Mingli On Mon, Jan 13, 2014 at 5:55 PM, Mingli Yuan mingli.y...@gmail.com wrote: Hi, folks, We just release an alpha version of Simbase, a vector similarity database that talks redis protocol. Since it is the first version of all its releases, we decided to keep it in alpha right now, for we want to hear from the community for any comments and improvements. Github page -- https://github.com/guokr/simbase We introduce the basic idea, limitations, build process and commands there. Background -- Simbase is a tool we developed during the process we revise our content recommendation engine. Our document set have 300k docs, and we use LDA to change them into vectors. But how to compare the 300k vectors was a problem for us then. We had tried different method, but the performance is not very good. Since the comparison logic is quit simple, we decided to write a new data store to do the tricks. So far, we are satisfied by its performance. Under the setting of an i7 MacBook and 120k 1k-dimensional vector set: - write: about 1 ops per second - read: up to 1k ops per second The real read performance may be higher than the current result, because our testing method is limited. Regards, Mingli -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Nginx-Clojure Let You Deploy Clojure Web App on Nginx Without Any Java Web Server
Hi, Xfeep, Thanks for your contribution, and the project looks interesting. For me, the idea of driving ring webapp behind nginx is not new. We use uwsgi to drive our ring app behind nginx in our production. uwsgi support JVM and ring for almost one year, and I think the code is relative stable right now. - it support a native protocol between nginx and uwsgi which is more efficient than http - it support unix socket - and a rich uwsgi api layer to provide some means to communicate between webapps - and according to the performance tests by the author, it is a little bit faster than jetty. It is on our production for half a year, quite stable, and very harmonious with the python app. I am not want to sale the solution of uwsgi, but it worth taking a look and make some comparison. Regards, Mingli On Tue, Jan 14, 2014 at 9:12 PM, Xfeep Zhang easyj...@163.com wrote: You are welcome! Yes, you are right. One JVM instance is embed per Nginx Worker process. The number of Nginx Workers is generally the same with the number of CPU. If one Worker crashs the Nginx Master will create a new one so you don't worry about JVM crashs accidentally. Although there will be several JVM instances, there 's only one main thread attached with the Nginx Woker process. So the JVM instance uses less memory and no thread context switch cost in every JVM instance. In some cases If you can use only one JVM instance, you can set the Nginx Worker number to be 1 and set jvm_workers 1, nginx-clojure will create a thread pool with fixed number of thread. to handle requests for you. On Tuesday, January 14, 2014 5:50:34 PM UTC+8, Feng Shen wrote: Hi, Thanks for your work on nginx-clojure. It looks great! As I know Nginx spawns many processes(correct me if I am wrong), does that mean, there will be many JVM process? On Tuesday, January 14, 2014 4:44:18 PM UTC+8, Xfeep Zhang wrote: I have done the first one. The result is HEREhttps://github.com/ptaoussanis/clojure-web-server-benchmarks( https://github.com/ptaoussanis/clojure-web-server-benchmarks ) Thanks Taoussanis for his invitation to the project clojure-web-server-benchmarkshttps://github.com/ptaoussanis/clojure-web-server-benchmarkshosted on Github. On Tuesday, January 14, 2014 10:31:03 AM UTC+8, Xfeep Zhang wrote: You're welcome. I think there are several difficult phases : (1) update the test program in clojure-web-server-benchmarkshttps://github.com/ptaoussanis/clojure-web-server-benchmarks, make the some packages to be the latest. (eg. http-kit from 1.3.0-alpha2 -- 2.1.16) and add nginx-php testing (2) test about real world size contents by group eg. tiny, small, medium, huge. (3) test about real world connection circumstances where a lot of connection is inactive but keep open. (4) try some real asynchronous test to fetch external resources (eg. rest service , db) before response to the client. eg. using libdrizzlehttps://launchpad.net/drizzlea no-blocking mysql client from https://launchpad.net/drizzle On Tuesday, January 14, 2014 2:41:50 AM UTC+8, Sergey Didenko wrote: Looks very interesting, thank you for your work! I wonder how this is going to improve latency in comparison to nginx + http-kit for some real world test that is not using heavy DB operations. On Mon, Jan 13, 2014 at 5:57 AM, Xfeep Zhang easy...@163.com wrote: So far I have found why nginx-clojure is slower than http-kit when 1 concurrents. (when = 1000 concurrents nginx-clojure is faster than http-kit.) I have set too many connections per nginx worker (worker_connections = 2) . This make nginx only use one worker to handle ab requests (every request is tiny). I plan to take note of c-erlang-java-performancehttp://timyang.net/programming/c-erlang-java-performance/and fork clojure-web-server-benchmarkshttps://github.com/ptaoussanis/clojure-web-server-benchmarksto do some real world tests. On Sunday, January 12, 2014 11:21:06 PM UTC+8, Xfeep Zhang wrote: Sorry for my mistake! 1. In the static file test, the ring-jetty result is about 10 concurrents. NOT 1 concurrents (Concurrency Level: 10 in the ab report ). 2. In the small string test, All results about three server are about 10 concurrents. NOT 1 concurrents. There are right results about these two mistake : 1. static file test (3) ring-jetty more bad than 10 concurrents === Document Path: / Document Length:29686 bytes *Concurrency Level: 1* Time taken for tests: 6.303 seconds Complete requests: 10 Failed requests:0 Write errors: 0 Total transferred: 298220 bytes HTML transferred: 296860 bytes Requests per second:15864.43 [#/sec] (mean) Time per request: 630.341 [ms] (mean) Time per request: 0.063 [ms] (mean, across all concurrent
[ANN] Simbase: A vector similarity database
Hi, folks, We just release an alpha version of Simbase, a vector similarity database that talks redis protocol. Since it is the first version of all its releases, we decided to keep it in alpha right now, for we want to hear from the community for any comments and improvements. Github page -- https://github.com/guokr/simbase We introduce the basic idea, limitations, build process and commands there. Background -- Simbase is a tool we developed during the process we revise our content recommendation engine. Our document set have 300k docs, and we use LDA to change them into vectors. But how to compare the 300k vectors was a problem for us then. We had tried different method, but the performance is not very good. Since the comparison logic is quit simple, we decided to write a new data store to do the tricks. So far, we are satisfied by its performance. Under the setting of an i7 MacBook and 120k 1k-dimensional vector set: - write: about 1 ops per second - read: up to 1k ops per second The real read performance may be higher than the current result, because our testing method is limited. Regards, Mingli -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[ANN] stan-cn-* 0.0.3 and clj-cn-nlp 0.2.0 released
Hi, buddies, We had released a new version (0.0.3) of stan-cn-* packages and corresponding clojure bindings. stan-cn-* packages provide an API wrapper for Stanford CoreNLP packages aiming to reduce the configuration complexity for Chinese users. Please check below READMEs for usage: * https://github.com/guokr/stan-cn-seg * https://github.com/guokr/stan-cn-ner * https://github.com/guokr/stan-cn-tag * https://github.com/guokr/clj-cn-nlp * https://github.com/guokr/stan-cn-nlp Thanks. Regards, Mingli -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: PoC: Combining Wikidata and Clojure logic programming
Thanks very much, David, Timothy and Karsten, I know some RDF store like Jena or Stardog, but the reason I want to take a try of Clojure logic programming is the simplicity: * setup for core.logic is very easy by lein * no server needed * and even from the concept level, Semantic Web is based on Description logic http://en.wikipedia.org/wiki/Description_logic which is purely logic things. Maybe the simplicity is very nice for some special use cases. But I don't know whether the idea is practical if the size of the triple set is very large. Right now I am downloading the wikidata database which contains millions of entities and more triples. I will try different approach, and benchmark them. I am new to this area, and trying to learn more! Thanks again. Regards, Mingli On Wed, Aug 7, 2013 at 7:53 PM, Karsten Schmidt toxmeis...@gmail.comwrote: Hi Mingli, FYI for the past 3 months I've been working almost fulltime on a lightweight, modular RDF Clojure toolkit, which I plan to opensource in the near future, once the core API has more solidified. The kit so far features: * core RDF datatype protocols (URIs, blank nodes, literals, containers XSD type handling via multimethods) * simple named graphs, datasets of multiple graphs * protocol based triple store implementations: in-memory (Clojure data structures), Redis, Cassandra (WIP) * SPARQL style query update engine: ** queries currently expressed as Clojure expressions ** SPARQL syntax parser (WIP) ** customizable query optimizations ** fixed-length property paths ** basic federation queries ** optional queries ** filter expressions, binding injection, grouping, sorting * graph - tree mapper to turn a set of triples into nested object maps * rule based inferencing ** supplied rule set of common OWL/RDFS semantics * streaming Turtle JSON-LD IO, SPARQL result export as CSV, XML JSON * customizable CSV - RDF conversion Current focus of development: * SPARQL HTTP endpoint protocol implementation * Streamed reasoning/inferencing w/ SPARQL-T * Extend support of OWL semantics in query engine * SPIN support, allowing queries, constraints inference rules to be defined in RDF * async distributed query processor * Library of AngularJS visualization directives/components of SPARQL results (written in CLJS) In terms of performance, I can't unfortunately share yet any real benchmark results since I've only recently started looking into that for some core components, but IMHO things are looking promising (and obviously still have lengths to go). E.g. Using the in-memory store, the standard LUBM dataset with 1 uni 105k triples loads in avg. 4.8 secs on a 2010 MBP. With the Redis store (using the fabulous Carmine lib), the same loads in under 11 secs, but I know this will be a lot faster once I've switched to batching. So far the query engine has only been tested with smaller datasets (around 20k triples) and medium complex queries w/ around a dozen of graph patterns (incl. paths optional queries) and hundred of results complete in 100 ms. I will announce the release on this list once I'm comfortable with the basic setup have spent some quality time on documentation... On 5 Aug 2013 18:13, Timothy Baldridge tbaldri...@gmail.com wrote: This looks a re-implementation of many of the goals of Datomic. Perhaps you can use Datomic as a datastore, and then use Datomic's datalog, or a custom query engine (such as core.logic https://github.com/clojure/core.logic/blob/master/src/main/clojure/clojure/core/logic/datomic.clj) to do your queries? Timothy On Mon, Aug 5, 2013 at 10:52 AM, David Nolen dnolen.li...@gmail.comwrote: Very interesting. The rel feature is really still a bit of an experimental thing and we'd like to replace it eventually with something less problematic like pldb http://github.com/threatgrid/pldb. Still, core.logic isn't really a database and your needs may be better served by something with different goals. David On Mon, Aug 5, 2013 at 12:41 PM, Mingli Yuan mingli.y...@gmail.comwrote: Hi, folks, After one night quick work, I had gave a proof-of-concept to demonstrate the feasibility that we can combine Wikidata and Clojure logic programming together. The source code is at here: https://github.com/mountain/knowledge An example of an entity: https://github.com/mountain/knowledge/blob/master/src/entities/albert_einstein.clj Example of types: https://github.com/mountain/knowledge/blob/master/src/meta/types.clj Example of predicates: https://github.com/mountain/knowledge/blob/master/src/meta/properties.clj Example of inference: https://github.com/mountain/knowledge/blob/master/test/knowledge/test.clj Also we found it is very easy to get any other language version than English. Since I am new to Clojure logic programming, I have questions for the way I take - what will happen when we have millions of triples? Should I take another
PoC: Combining Wikidata and Clojure logic programming
Hi, folks, After one night quick work, I had gave a proof-of-concept to demonstrate the feasibility that we can combine Wikidata and Clojure logic programming together. The source code is at here: https://github.com/mountain/knowledge An example of an entity: https://github.com/mountain/knowledge/blob/master/src/entities/albert_einstein.clj Example of types: https://github.com/mountain/knowledge/blob/master/src/meta/types.clj Example of predicates: https://github.com/mountain/knowledge/blob/master/src/meta/properties.clj Example of inference: https://github.com/mountain/knowledge/blob/master/test/knowledge/test.clj Also we found it is very easy to get any other language version than English. Since I am new to Clojure logic programming, I have questions for the way I take - what will happen when we have millions of triples? Should I take another approach by using some RDF store? - How many memory will it cost? - How about the performance? - How about the loading process of one million clojure source file or java class file? Hope you can give some helpful comments. Thanks in advance. Regards, Mingli -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
CoderPost: Programmer's daily digest compiled by machine and human
Sorry for spamming, Recently I launch a paper.li site for compiled news on programming topics. The source of these news are from every related topics of pinboard.in. And I think the quality is still OK, and even better than what I originally think. So I hope you can take a look if you are interested. http://coderpost.org/ regards, Mingli -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[ANN] uWSGI support for ring (early stage)
Hi, folks, Yesterday uWSGI had released a ring plugins to give basic support for Clojure webdev. - https://uwsgi-docs.readthedocs.org/en/latest/Ring.html - https://uwsgi-docs.readthedocs.org/en/latest/JVM.html - http://lists.unbit.it/pipermail/uwsgi/2013-March/005549.html - http://lists.unbit.it/pipermail/uwsgi/2013-March/005562.html It is still in early stage, and not production-ready, but we plan to evolve it into mature. JVM and ring support had been settled in the roadmap of next recent versions of uWSGI. And thanks for the great support from unbit team and Roberto, the original author of uWSGI. The reason we, a small team in Beijing, adopt a C-based web container are as below: - We use both python and clojure heavily. - uWSGI work with nginx smoothly. - Easy config and management of uWSGI - We are open source supporters In next few weeks, we will test this ring implementation thoroughly. Any comments and participation are welcomed! Thanks. Regards, Mingli -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.