[arangodb-google] Re: Faceted Search Performance

2017-09-18 Thread Jan
Hi,

yes, it will scan the collection 4 times with the query below, once for 
each subquery.
Short-term I do not see any way to speed that up with existing AQL.
Long-term a few ways to handle this would be to parallelize the scanning or 
to collect and count the same input by multiple criteria with a single 
COLLECT clause. 
Parallelizing it would still mean 4 scans, but in parallel. That could 
reduce latency quite a bit. Collecting and counting the same input by 
multiple criteria looks a bit more attractive, but does not really fit into 
the current way the AQL pipelines are set up.

Another alternative is to split the query on the client side and run 4 
individual AQL queries, collect their results and aggregate them on the 
client side. That will work, but requires having quite some more logic in 
the client/application code.

Best regards
Jan 


Am Montag, 18. September 2017 22:50:20 UTC+2 schrieb Roman Kuzmik:
>
> 4 seconds per facet, thus adding 3 more it takes us to 16 seconds.
> btw, why is that, arango is doing full scan anyways. is it doing it 4 
> times with the query bellow? Is there any way to make it smarter?
>
>  LET docs = (FOR a IN Asset 
>  RETURN a)
> LET attribute1 = (
>  FOR a in docs 
>   COLLECT attr = a.attribute1 WITH COUNT INTO length
>  RETURN { value: attr, count: length}
> )
> LET attribute2 = (
>  FOR a in docs 
>   COLLECT attr = a.attribute2 WITH COUNT INTO length
>  RETURN { value: attr, count: length}
> )
> LET attribute3 = (
>  FOR a in docs 
>   COLLECT attr = a.attribute3 WITH COUNT INTO length
>  RETURN { value: attr, count: length}
> )
> LET attribute4 = (
>  FOR a in docs 
>   COLLECT attr = a.attribute4 WITH COUNT INTO length
>  RETURN { value: attr, count: length}
> )
> RETURN {
>   counts: (RETURN {
> total: LENGTH(docs), 
> offset: 2, 
> to: 4, 
> facets: {
>   attribute1: {
> from: 0, 
> to: 5,
> total: LENGTH(attribute1)
>   },
>   attribute2: {
> from: 5, 
> to: 10,
> total: LENGTH(attribute2)
>   },
>   attribute3: {
> from: 0, 
> to: 1000,
> total: LENGTH(attribute3)
>   },
>   attribute4: {
> from: 0, 
> to: 1000,
> total: LENGTH(attribute4)
>   }
> }
>   }),
>   items: (FOR a IN docs LIMIT 2, 4 RETURN {id: a._id, name: a.name}),
>   facets: {
> attribute1: (FOR a in attribute1 SORT a.count LIMIT 0, 5 return a),
> attribute2: (FOR a in attribute2 SORT a.value LIMIT 5, 10 return a),
> attribute3: (FOR a in attribute3 LIMIT 0, 1000 return a),
> attribute4: (FOR a in attribute4 SORT a.count, a.value LIMIT 0, 1000 
> return a)
>}
> }
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to arangodb+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[arangodb-google] Re: Faceted Search Performance

2017-09-18 Thread Roman Kuzmik
4 seconds per facet, thus adding 3 more it takes us to 16 seconds.
btw, why is that, arango is doing full scan anyways. is it doing it 4 times 
with the query bellow? Is there any way to make it smarter?

 LET docs = (FOR a IN Asset 
 RETURN a)
LET attribute1 = (
 FOR a in docs 
  COLLECT attr = a.attribute1 WITH COUNT INTO length
 RETURN { value: attr, count: length}
)
LET attribute2 = (
 FOR a in docs 
  COLLECT attr = a.attribute2 WITH COUNT INTO length
 RETURN { value: attr, count: length}
)
LET attribute3 = (
 FOR a in docs 
  COLLECT attr = a.attribute3 WITH COUNT INTO length
 RETURN { value: attr, count: length}
)
LET attribute4 = (
 FOR a in docs 
  COLLECT attr = a.attribute4 WITH COUNT INTO length
 RETURN { value: attr, count: length}
)
RETURN {
  counts: (RETURN {
total: LENGTH(docs), 
offset: 2, 
to: 4, 
facets: {
  attribute1: {
from: 0, 
to: 5,
total: LENGTH(attribute1)
  },
  attribute2: {
from: 5, 
to: 10,
total: LENGTH(attribute2)
  },
  attribute3: {
from: 0, 
to: 1000,
total: LENGTH(attribute3)
  },
  attribute4: {
from: 0, 
to: 1000,
total: LENGTH(attribute4)
  }
}
  }),
  items: (FOR a IN docs LIMIT 2, 4 RETURN {id: a._id, name: a.name}),
  facets: {
attribute1: (FOR a in attribute1 SORT a.count LIMIT 0, 5 return a),
attribute2: (FOR a in attribute2 SORT a.value LIMIT 5, 10 return a),
attribute3: (FOR a in attribute3 LIMIT 0, 1000 return a),
attribute4: (FOR a in attribute4 SORT a.count, a.value LIMIT 0, 1000 
return a)
   }
}


-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to arangodb+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[arangodb-google] Re: Faceted Search Performance

2017-09-18 Thread Roman Kuzmik
compiled your changes from feature/mmfiles-hash-lookup-performance
indeed, on single facet we are down to 4 seconds from 6 seconds (in the 
test case provided above). And no indexes needed.

hope it will make to master soon.

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to arangodb+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.