Re: [ANN] Simbase: A vector similarity database

2014-06-04 Thread Mingli Yuan
Hi,folks,

Simbase v0.1.0-beta1 just release! We had fix many bugs,and the system are
very stable for almost half a year in our cases。

In the docs, we add simple scenario for your references

Setup

> bmk b2048 t1 t2 t3 ... t2047 t2048
> vmk b2048 article
> vmk b2048 userprofile
> rmk userprofile article cosinesq

Fill data

> vadd article 1 0.11 0.112 0.1123...
> vadd article 2 0.21 0.212 0.2123...
...

> vadd userprofile 1 0.11 0.112 0.1123...
> vadd userprofile 2 0.21 0.212 0.2123...
...

Query

> rrec userprofile 2 article




On Sun, Jan 26, 2014 at 9:21 AM, Mingli Yuan  wrote:

> Hi, folks,
>
> This week we released v0.1.0-alpha3
>
> * Remove constrains on vectors, Simbase support arbitrary vectors now
> * Fix various bugs on memory structure to keep scale ratio linearly
> * Almost 7 times improvement on performance, right now it can handle 100k
> dimensional dense vectors in under 0.14 sec on a i7-cup mac laptop.
>
> From now on, it enter the beta phase. If it is relevant to your work,  we
> encourage you to have a try, and help us to find more bugs.
>
> Regards,
> Mingli
>
>
> On Mon, Jan 13, 2014 at 5:55 PM, Mingli Yuan 
> wrote:
>
>> Hi, folks,
>>
>> We just release an alpha version of Simbase, a vector similarity database
>> that talks redis protocol. Since it is the first version of all its
>> releases, we decided to keep it in alpha right now, for we want to hear
>> from the community for any comments and improvements.
>>
>> Github page
>> --
>>
>> https://github.com/guokr/simbase
>>
>> We introduce the basic idea, limitations, build process and commands
>> there.
>>
>> Background
>> --
>>
>> Simbase is a tool we developed during the process we revise our content
>> recommendation engine.
>>
>> Our document set have 300k docs, and we use LDA to change them into
>> vectors. But how to compare the 300k vectors was a problem for us then. We
>> had tried different method, but the performance is not very good.
>>
>> Since the comparison logic is quit simple, we decided to write a new data
>> store to do the tricks.
>>
>> So far, we are satisfied by its performance. Under the setting of an i7
>> MacBook and 120k 1k-dimensional vector set:
>>
>>- write: about 1 ops per second
>>- read: up to 1k ops per second
>>
>> The real read performance may be higher than the current result, because
>> our testing method is limited.
>>
>> Regards,
>>
>> Mingli
>>
>>
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] clj-cn-nlp 0.2.1 released

2014-04-18 Thread Mingli Yuan
Hi, folks, we just released clj-cn-nlp version 0.2.1, a Clojure NLP wrapper
based on Stanford-CoreNLP for Simplified Chinese users.

Three default Chinese language model was shipped with this wrapper to
provide:

   - seg: Chinese word segmentation
   - ner: Chinese naming entity recognition
   - tag: Chinese POS tagging

In this release, the API dose not change, but we simplified the
implementation a lot, remove reflection and customised class loading in
code, and in the meantime the performance should be better than the
previous release.

Add below dependency in your project.clj

[com.guokr/clj-cn-nlp "0.2.1"]


And then in repl you will see


> (use 'com.guokr.nlp.seg)

> (use 'com.guokr.nlp.ner)

> (use 'com.guokr.nlp.tag)

> (seg "Clojure是由里奇·希基发明的一门编程语言,它的开发者已经遍布世界,渗入到各个应用领域!")

> "Clojure 是 由 里 奇·希基 发明 的 一 门 编程 语言 , 它 的 开发者 已经 遍布 世界 , 渗入 到 各个 应用 领域 !"

> (ner "Clojure是由多才多艺的瑞奇·希基发明的一门编程语言,它的开发者已经遍布世界,渗入到各个应用领域!")

> "Clojure/O 是/O 由/O 多才多艺/O 的/O 瑞奇/PERSON·/O希基/PERSON 发明/O 的/O 一/O 门/O 编程/O 
> 语言/O ,/O 它/O 的/O 开发者/O 已经/O 遍布/O 世界/O ,/O 渗入/O 到/O 各个/O 应用/O 领域/O !/O"

> (tag "Clojure是由多才多艺的瑞奇·希基发明的一门编程语言,它的开发者已经遍布世界,渗入到各个应用领域!")

> "Clojure#NR 是#VC 由#P 多才多艺#VV 的#DEC 瑞奇·希基#NR 发明#VV 的#DEC 一#CD 门#M 编程#NN 语言#NN 
> ,#PU 它#PN 的#DEG 开发者#NN 已经#AD 遍布#VV 世界#NN ,#PU 渗入#VV 到#VV 各个#DT 应用#NN 领域#NN 
> !#PU"


Any comments and bugfix are welcomed!


Regards,

Mingli

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Simbase: A vector similarity database

2014-01-25 Thread Mingli Yuan
Hi, folks,

This week we released v0.1.0-alpha3

* Remove constrains on vectors, Simbase support arbitrary vectors now
* Fix various bugs on memory structure to keep scale ratio linearly
* Almost 7 times improvement on performance, right now it can handle 100k
dimensional dense vectors in under 0.14 sec on a i7-cup mac laptop.

>From now on, it enter the beta phase. If it is relevant to your work,  we
encourage you to have a try, and help us to find more bugs.

Regards,
Mingli


On Mon, Jan 13, 2014 at 5:55 PM, Mingli Yuan  wrote:

> Hi, folks,
>
> We just release an alpha version of Simbase, a vector similarity database
> that talks redis protocol. Since it is the first version of all its
> releases, we decided to keep it in alpha right now, for we want to hear
> from the community for any comments and improvements.
>
> Github page
> --
>
> https://github.com/guokr/simbase
>
> We introduce the basic idea, limitations, build process and commands there.
>
> Background
> --
>
> Simbase is a tool we developed during the process we revise our content
> recommendation engine.
>
> Our document set have 300k docs, and we use LDA to change them into
> vectors. But how to compare the 300k vectors was a problem for us then. We
> had tried different method, but the performance is not very good.
>
> Since the comparison logic is quit simple, we decided to write a new data
> store to do the tricks.
>
> So far, we are satisfied by its performance. Under the setting of an i7
> MacBook and 120k 1k-dimensional vector set:
>
>- write: about 1 ops per second
>- read: up to 1k ops per second
>
> The real read performance may be higher than the current result, because
> our testing method is limited.
>
> Regards,
>
> Mingli
>
>
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Nginx-Clojure Let You Deploy Clojure Web App on Nginx Without Any Java Web Server

2014-01-14 Thread Mingli Yuan
Hi, Xfeep,

Thanks for your contribution, and the project looks interesting.

For me, the idea of driving ring webapp behind nginx is not new.
We use uwsgi to drive our ring app behind nginx in our production.
uwsgi support JVM and ring for almost one year, and I think the code is
relative stable right now.

- it support a native protocol between nginx and uwsgi which is more
efficient than http
- it support unix socket
- and a rich uwsgi api layer to provide some means to communicate between
webapps
- and according to the performance tests by the author, it is a little bit
faster than jetty.

It is on our production for half a year, quite stable, and very harmonious
with the python app.

I am not want to sale the solution of uwsgi, but it worth taking a look and
make some comparison.

Regards,
Mingli


On Tue, Jan 14, 2014 at 9:12 PM, Xfeep Zhang  wrote:

> You are welcome!
>
> Yes, you are right.  One JVM instance is embed  per Nginx Worker process.
> The number of Nginx Workers  is generally the same with the number of CPU.
>
> If one Worker crashs the Nginx Master will create a new one so you don't
> worry about JVM crashs accidentally.
>
> Although there will be several JVM instances,  there 's only one main
> thread attached with the Nginx Woker process.
>
> So the JVM instance uses less memory and no thread context switch cost in
> every JVM instance.
>
> In some cases If you can  use only one JVM instance,  you can set the
> Nginx Worker number to be 1 and set jvm_workers > 1,  nginx-clojure will
> create
>
> a thread pool with fixed number of thread.
>
> to handle requests for you.
>
>
> On Tuesday, January 14, 2014 5:50:34 PM UTC+8, Feng Shen wrote:
>>
>> Hi,
>>
>> Thanks for your work on nginx-clojure. It looks great!
>>
>> As I know Nginx spawns many processes(correct me if I am wrong),  does
>> that mean, there will be many JVM process?
>>
>>
>>
>>
>> On Tuesday, January 14, 2014 4:44:18 PM UTC+8, Xfeep Zhang wrote:
>>>
>>>
>>> I have done the first one. The result is 
>>> HERE(
>>> https://github.com/ptaoussanis/clojure-web-server-benchmarks )
>>> Thanks Taoussanis for his invitation to the project
>>> clojure-web-server-benchmarkshosted
>>>  on Github.
>>>
>>> On Tuesday, January 14, 2014 10:31:03 AM UTC+8, Xfeep Zhang wrote:

 You're welcome.

 I think there are several difficult phases :

 (1)  update the test program in 
 clojure-web-server-benchmarks,
  make the some packages to be the latest. (eg. http-kit from 1.3.0-alpha2
 --> 2.1.16) and add nginx-php testing
 (2)  test about real world size contents by group eg. tiny, small,
 medium, huge.
 (3)  test about real world connection circumstances where a lot of
 connection is inactive but keep open.
 (4)  try some real asynchronous test to fetch external resources (eg.
 rest service , db) before response to the client. eg.  using 
 libdrizzlea no-blocking mysql  client from
 https://launchpad.net/drizzle

 On Tuesday, January 14, 2014 2:41:50 AM UTC+8, Sergey Didenko wrote:
>
> Looks very interesting, thank you for your work!
>
> I wonder how this is going to improve latency in comparison to nginx +
> http-kit for some real world test that is not using heavy DB operations.
>
>
> On Mon, Jan 13, 2014 at 5:57 AM, Xfeep Zhang  wrote:
>
>>
>> So far I have found why nginx-clojure is slower than http-kit when
>> 1 concurrents. (when < = 1000 concurrents nginx-clojure is faster 
>> than
>> http-kit.)
>> I have set too many connections per nginx worker (worker_connections
>> = 2) . This make nginx only use one worker to handle ab  requests
>> (every request is tiny).
>> I plan to take note of 
>> c-erlang-java-performanceand
>>  fork
>> clojure-web-server-benchmarksto
>>   do some  real world tests.
>>
>>
>>
>> On Sunday, January 12, 2014 11:21:06 PM UTC+8, Xfeep Zhang wrote:
>>>
>>> Sorry for my mistake!
>>>
>>> 1. In the static file test, the ring-jetty result is about 10
>>> concurrents. NOT 1 concurrents  ("Concurrency Level:  10" in  
>>> the
>>> ab report ).
>>> 2. In the small string test, All results about three server are
>>> about 10 concurrents. NOT 1 concurrents.
>>>
>>> There are right results about these two mistake :
>>>
>>> 1. static file test
>>>
>>> (3) ring-jetty  more bad than 10 concurrents
>>> 
>>> ===
>>> Document Path:  /
>>> Document Length:  

[ANN] Simbase: A vector similarity database

2014-01-13 Thread Mingli Yuan
Hi, folks,

We just release an alpha version of Simbase, a vector similarity database
that talks redis protocol. Since it is the first version of all its
releases, we decided to keep it in alpha right now, for we want to hear
from the community for any comments and improvements.

Github page
--

https://github.com/guokr/simbase

We introduce the basic idea, limitations, build process and commands there.

Background
--

Simbase is a tool we developed during the process we revise our content
recommendation engine.

Our document set have 300k docs, and we use LDA to change them into
vectors. But how to compare the 300k vectors was a problem for us then. We
had tried different method, but the performance is not very good.

Since the comparison logic is quit simple, we decided to write a new data
store to do the tricks.

So far, we are satisfied by its performance. Under the setting of an i7
MacBook and 120k 1k-dimensional vector set:

   - write: about 1 ops per second
   - read: up to 1k ops per second

The real read performance may be higher than the current result, because
our testing method is limited.

Regards,

Mingli

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[ANN] stan-cn-* 0.0.3 and clj-cn-nlp 0.2.0 released

2013-10-21 Thread Mingli Yuan
Hi, buddies,

We had released a new version (0.0.3) of stan-cn-* packages and
corresponding clojure bindings. stan-cn-* packages provide an API wrapper
for Stanford CoreNLP packages aiming to reduce the configuration complexity
for Chinese users.

Please check below READMEs for usage:

* https://github.com/guokr/stan-cn-seg
* https://github.com/guokr/stan-cn-ner
* https://github.com/guokr/stan-cn-tag
* https://github.com/guokr/clj-cn-nlp
* https://github.com/guokr/stan-cn-nlp

Thanks.

Regards,
Mingli

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: PoC: Combining Wikidata and Clojure logic programming

2013-08-07 Thread Mingli Yuan
Thanks very much, David, Timothy and Karsten,

I know some RDF store like Jena or Stardog, but the reason I want to take a
try of Clojure logic programming is the simplicity:

* setup for core.logic is very easy by lein
* no server needed
* and even from the concept level, Semantic Web is based on Description
logic <http://en.wikipedia.org/wiki/Description_logic> which is purely
logic things.

Maybe the simplicity is very nice for some special use cases.
But I don't know whether the idea is practical if the size of the triple
set is very large.

Right now I am downloading the wikidata database which contains millions of
entities and more triples.
I will try different approach, and benchmark them.

I am new to this area, and trying to learn more! Thanks again.

Regards,
Mingli




On Wed, Aug 7, 2013 at 7:53 PM, Karsten Schmidt wrote:

> Hi Mingli,
>
> FYI for the past 3 months I've been working almost fulltime on a
> lightweight, modular RDF Clojure toolkit, which I plan to opensource in the
> near future, once the core API has more solidified. The kit so far features:
>
> * core RDF datatype protocols (URIs, blank nodes, literals, containers &
> XSD type handling via multimethods)
> * simple & named graphs, datasets of multiple graphs
> * protocol based triple store implementations: in-memory (Clojure data
> structures), Redis, Cassandra (WIP)
> * SPARQL style query & update engine:
>   ** queries currently expressed as Clojure expressions
>   ** SPARQL syntax parser (WIP)
>   ** customizable query optimizations
>   ** fixed-length property paths
>   ** basic federation queries
>   ** optional queries
>   ** filter expressions, binding injection, grouping, sorting
> * graph -> tree mapper to turn a set of triples into nested object maps
> * rule based inferencing
>   ** supplied rule set of common OWL/RDFS semantics
> * streaming Turtle & JSON-LD IO, SPARQL result export as CSV, XML & JSON
> * customizable CSV -> RDF conversion
>
> Current focus of development:
> * SPARQL HTTP endpoint & protocol implementation
> * Streamed reasoning/inferencing w/ SPARQL-T
> * Extend support of OWL semantics in query engine
> * SPIN support, allowing queries, constraints & inference rules to be
> defined in RDF
> * async distributed query processor
> * Library of AngularJS visualization directives/components of SPARQL
> results (written in CLJS)
>
> In terms of performance, I can't unfortunately share yet any real
> benchmark results since I've only recently started looking into that for
> some core components, but IMHO things are looking promising (and obviously
> still have lengths to go). E.g. Using the in-memory store, the standard
> LUBM dataset with 1 uni & 105k triples loads in avg. 4.8 secs on a 2010
> MBP. With the Redis store (using the fabulous Carmine lib), the same loads
> in under 11 secs, but I know this will be a lot faster once I've switched
> to batching. So far the query engine has only been tested with smaller
> datasets (around 20k triples) and medium complex queries w/ around a dozen
> of graph patterns (incl. paths & optional queries) and hundred of results
> complete in < 100 ms.
>
> I will announce the release on this list once I'm comfortable with the
> basic setup & have spent some quality time on documentation...
> On 5 Aug 2013 18:13, "Timothy Baldridge"  wrote:
>
>> This looks a re-implementation of many of the goals of Datomic. Perhaps
>> you can use Datomic as a datastore, and then use Datomic's datalog, or a
>> custom query engine (such as core.logic
>> https://github.com/clojure/core.logic/blob/master/src/main/clojure/clojure/core/logic/datomic.clj)
>> to do your queries?
>>
>> Timothy
>>
>>
>> On Mon, Aug 5, 2013 at 10:52 AM, David Nolen wrote:
>>
>>> Very interesting. The rel feature is really still a bit of an
>>> experimental thing and we'd like to replace it eventually with something
>>> less problematic like pldb http://github.com/threatgrid/pldb.
>>>
>>> Still, core.logic isn't really a database and your needs may be better
>>> served by something with different goals.
>>>
>>> David
>>>
>>>
>>> On Mon, Aug 5, 2013 at 12:41 PM, Mingli Yuan wrote:
>>>
>>>> Hi, folks,
>>>>
>>>> After one night quick work, I had gave a proof-of-concept to
>>>> demonstrate the feasibility that we can combine Wikidata and Clojure logic
>>>> programming together.
>>>>
>>>> The source code is at here:
>>>> https://github.com/mountain/knowledge
>>>>
>>>> An e

PoC: Combining Wikidata and Clojure logic programming

2013-08-05 Thread Mingli Yuan
Hi, folks,

After one night quick work, I had gave a proof-of-concept to demonstrate
the feasibility that we can combine Wikidata and Clojure logic programming
together.

The source code is at here:
https://github.com/mountain/knowledge

An example of an entity:
https://github.com/mountain/knowledge/blob/master/src/entities/albert_einstein.clj

Example of types:
https://github.com/mountain/knowledge/blob/master/src/meta/types.clj

Example of predicates:
https://github.com/mountain/knowledge/blob/master/src/meta/properties.clj

Example of inference:
https://github.com/mountain/knowledge/blob/master/test/knowledge/test.clj

Also we found it is very easy to get any other language version than
English.

Since I am new to Clojure logic programming, I have questions for the way I
take - what will happen when we have millions of triples? Should I take
another approach by using some RDF store?

   - How many memory will it cost?
   - How about the performance?
   - How about the loading process of one million clojure source file or
   java class file?

Hope you can give some helpful comments. Thanks in advance.

Regards,
Mingli

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




CoderPost: Programmer's daily digest compiled by machine and human

2013-07-27 Thread Mingli Yuan
Sorry for spamming,

Recently I launch a paper.li site for compiled news on programming topics.
The source of these news are from every related topics of pinboard.in.

And I think the quality is still OK, and even better than what I originally
think.
So I hope you can take a look if you are interested.

http://coderpost.org/

regards,

Mingli

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




[ANN] uWSGI support for ring (early stage)

2013-03-08 Thread Mingli Yuan
Hi, folks,

Yesterday uWSGI had released a ring plugins to give basic support for
Clojure webdev.

   - https://uwsgi-docs.readthedocs.org/en/latest/Ring.html
   - https://uwsgi-docs.readthedocs.org/en/latest/JVM.html
   - http://lists.unbit.it/pipermail/uwsgi/2013-March/005549.html
   - http://lists.unbit.it/pipermail/uwsgi/2013-March/005562.html

It is still in early stage, and not production-ready, but we plan to evolve
it into mature. JVM and ring support had been settled in the roadmap of
next recent versions of uWSGI. And thanks for the great support from unbit
team and Roberto, the original author of uWSGI.

The reason we, a small team in Beijing, adopt a C-based web container are
as below:

   - We use both python and clojure heavily.
   - uWSGI work with nginx smoothly.
   - Easy config and management of uWSGI
   - We are open source supporters

In next few weeks, we will test this ring implementation thoroughly.

Any comments and participation are welcomed!

Thanks.

Regards,
Mingli

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.