[orientdb] Re: How to safely iterate through very large dataset with pagination?

Stuart Reynolds Wed, 15 Jun 2016 11:29:59 -0700

Does anyone know the recommended way for an ODB clients to receive a large 
query result (>10000 records)?


OSQLAsynchQuery seemed like an alternative to pagination (which produces 
odd behavior 
<http://stackoverflow.com/questions/37731791/orientdb-automatic-pagination-returning-duplicate-result-short-pages-and-an-in>),
 
but OSQLAsynchQuery  gives me this distressing warning if I ask for >10000 
results? 

INFO: {db=jsondb} [TIP] Query 'SELECT FROM Thing' returned a result set with 
more than 10000 records. Check if you really need all these records, or reduce 
the resultset by using a LIMIT to improve both performance and used RAM


My example is on SO here 
<http://stackoverflow.com/questions/37842775/orientdb-large-query-tips-how-to-safely-implement-them-or-not-supported>
.

Should I ignore the warning, or should I be doing something else?


On Tuesday, June 14, 2016 at 8:37:50 AM UTC-7, Stuart Reynolds wrote:
>
> Bump!
>
> I have confirmed that pagination still has this odd (buggy?) behavior in 
> 2.1.19 and filed a bug.
> https://github.com/orientechnologies/orientdb/issues/6298
>
>
> On Thursday, June 9, 2016 at 10:08:26 AM UTC-7, Stuart Reynolds wrote:
>>
>> (Sorry for the repost -- my original question was a mess. Have deleted it 
>> and am reposting).
>>
>> I'd like to iterate through a very large set of records in Orientdb.
>>
>>
>> So that the result doesn't fill up my machine's memory, I've tried to 
>> implement paginated queries, but I seem to be getting back
>>
>>    -  - duplicated documents
>>    -  - record sets shorter than the page size
>>    -  - a infinite series of results
>>
>> The original Java method listed in the docs 
>> <http://orientdb.com/docs/last/Pagination.html> is as follows:
>>
>>
>> OSQLSynchQuery<ODocument> query = new OSQLSynchQuery<ODocument>("select from 
>> Customer LIMIT 20");
>> for (List<ODocument> resultset = database.query(query); 
>> !resultset.isEmpty(); resultset = database.query(query)) {
>>     ...
>> }
>>
>>
>> I've implemented this as scala:
>>
>>
>> val query = new OSQLSynchQuery[ODocument]("select from Thing LIMIT 5")
>> var resultset = db.query[OResultSet[ODocument]](query)
>> while (!resultset.isEmpty()) {
>>   // process result set here
>>   resultset = db.query(query)
>> }
>>
>>
>> Here's the full example
>>
>>
>> def makeThing(x:Int) ={
>>   val doc = new ODocument("Thing")
>>   doc.field("x",x)
>>   doc
>> }
>>
>> val db: ODatabaseDocumentTx = new ODatabaseDocumentTx("memory:jsondb")
>> db.create()
>> db.set(MINIMUMCLUSTERS, 3)
>> db.set(CLUSTERSELECTION, "round-robin")
>> db.set(CONFLICTSTRATEGY, "content")
>> db.set(CHARSET, "UTF-8")
>>
>>
>> println("SAVING--------")
>>
>> for (x <- 0 until 12) {
>>   val doc:ODocument = makeThing(x)
>>   val saved = db.save[ODocument](doc)
>>   println(saved)
>> }
>>
>>
>> println("\n\nQUERYING--------")
>>
>> val query = new OSQLSynchQuery[ODocument]("select from Thing LIMIT 5")
>> var resultset = db.query[OResultSet[ODocument]](query)
>> while (!resultset.isEmpty()) {
>>   resultset.toArray.foreach(println)
>>   resultset = db.query(query)
>>   println("---------")
>> }
>>
>>
>> But here's the output:
>>
>>
>> SAVING--------
>> Thing#9:0{x:0} v1
>> Thing#10:0{x:1} v1
>> Thing#11:0{x:2} v1
>> Thing#9:1{x:3} v1
>> Thing#10:1{x:4} v1
>> Thing#11:1{x:5} v1
>> Thing#9:2{x:6} v1
>> Thing#10:2{x:7} v1
>> Thing#11:2{x:8} v1
>> Thing#9:3{x:9} v1
>> Thing#10:3{x:10} v1
>> Thing#11:3{x:11} v1
>>
>>
>>
>> QUERYING--------
>> Thing#9:0{x:0} v1
>> Thing#9:1{x:3} v1
>> Thing#9:2{x:6} v1
>> Thing#9:3{x:9} v1
>> Thing#10:0{x:1} v1  # So far, so good...
>> ---------
>> Thing#9:0{x:0} v1   # Already seen this
>> Thing#10:1{x:4} v1
>> Thing#10:2{x:7} v1
>> Thing#10:3{x:10} v1
>> Thing#11:0{x:2} v1
>> ---------
>> Thing#9:0{x:0} v1    # Already seen this
>> Thing#11:1{x:5} v1
>> Thing#11:2{x:8} v1
>> Thing#11:3{x:11} v1  # Page cut short
>> ---------
>> Thing#9:0{x:0} v1   # Already seen this!
>> ---------
>> Thing#9:1{x:3} v1
>> Thing#9:2{x:6} v1
>> Thing#9:3{x:9} v1
>> Thing#10:0{x:1} v1
>> Thing#10:1{x:4} v1
>>
>>
>>
>> Note that the DB is in memory, and no-one is simultaneously writing to 
>> the DB.
>>
>>
>> Using ODB client 2.1.1
>>
>>
>> What's the sane and safe way to iterate through a very large dataset. As 
>> far as I can see, the method in the docs is buggy.
>>
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: How to safely iterate through very large dataset with pagination?

Reply via email to