Re: [orientdb] Indices and Memory Usage

2017-05-18 Thread John J. Szucs
I will give this a try as soon as I can. I am unfortunately still in hospital, 
looking to be released today (Thursday) or Friday (US East time).

---
John J. Szucs (on my iPhone)

> On May 18, 2017, at 04:52, Luca Garulli <l.garu...@orientdb.com> wrote:
> 
> Hi John,
> 
> At OrientDB startup, haven't you seen a warning in the log about this missing 
> setting?
> 
> 
> Best Regards,
> 
> Luca Garulli
> Founder & CEO
> OrientDB LTD
> 
>> On 18 May 2017 at 09:07, Andrey Lomakin <lomakin.and...@gmail.com> wrote:
>> If you want to calculate how much memory should be consumed by OrientDB 
>> please look at 
>> com.orientechnologies.common.directmemory:type=OByteBufferPoolMXBean JMX 
>> bean it contains such attribute as "preAllocationLimit" which will suggest 
>> you how much direct memory now is set to be used by the server.
>> 
>> If you need to be sure that you consume not more than you intended to 
>> provide for ODB server, please set system property 
>> "storage.diskCache.bufferSize" to the amount of direct memory in megabytes 
>> which you wish to allow to allocate by the server.
>> 
>> To be sure that no more direct memory than intended is allocated by ODB 
>> server.
>> Please run the test and after 24 hours (I suppose that is enough) please 
>> send us values of attributes of following JMX beans:
>> com.orientechnologies.common.directmemory:type=OByteBufferPoolMXBean
>> java.nio:type=BufferPool,name=direct  
>> So we can check whether  ODB consumes direct memory according to parameters 
>> which you set.
>> 
>> P.S. I do not think that there will be any additional value to switch to the 
>> server with a bigger amount of memory. I suppose would be better to set 
>> server settings accordingly.
>> 
>> 
>>> On Thu, May 18, 2017 at 9:55 AM Andrey Lomakin <lomakin.and...@gmail.com> 
>>> wrote:
>>> Hi John,
>>> 
>>> I suppose you did not set -XX:MaxDirectMemorySize=512g parameter.
>>> 
>>> 
>>>> On Wed, May 17, 2017 at 7:07 PM John J. Szucs <john.j.sz...@gmail.com> 
>>>> wrote:
>>>> After 70 hours on a 32GB VM, ODB 2.2.20, JRE 8u131, the job failed with a 
>>>> direct buffer memory exception. Given the complications I mentioned above, 
>>>> my next step is going to be to get a high-RAM AWS EC2 instance and run 
>>>> this there. However, as I mentioned above, my leadership is getting 
>>>> frustrated with this situation.
>>>> 
>>>> -- John
>>>> 
>>>> 'Battle of banja luka'. 
>>>> com.orientechnologies.orient.core.exception.ODatabaseException: Error on 
>>>> retrieving record #63:19090001 (cluster: xlink_simple_2)
>>>> 
>>>>DB name="kb"
>>>>at 
>>>> com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:2050)
>>>>at 
>>>> com.orientechnologies.orient.core.tx.OTransactionOptimistic.loadRecord(OTransactionOptimistic.java:187)
>>>>at 
>>>> com.orientechnologies.orient.core.tx.OTransactionOptimistic.loadRecord(OTransactionOptimistic.java:162)
>>>>at 
>>>> com.orientechnologies.orient.core.tx.OTransactionOptimistic.loadRecord(OTransactionOptimistic.java:291)
>>>>at 
>>>> com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:1729)
>>>>at 
>>>> com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:102)
>>>>at 
>>>> com.orientechnologies.orient.core.id.ORecordId.getRecord(ORecordId.java:329)
>>>>at 
>>>> com.tinkerpop.blueprints.impls.orient.OrientEdgeIterator.createGraphElement(OrientEdgeIterator.java:72)
>>>>at 
>>>> com.tinkerpop.blueprints.impls.orient.OrientEdgeIterator.createGraphElement(OrientEdgeIterator.java:44)
>>>>at 
>>>> com.orientechnologies.orient.core.iterator.OLazyWrapperIterator.hasNext(OLazyWrapperIterator.java:93)
>>>>at 
>>>> com.orientechnologies.common.collection.OMultiCollectionIterator.hasNextInternal(OMultiCollectionIterator.java:97)
>>>>at 
>>>> com.orientechnologies.common.collection.OMultiCollectionIterator.hasNext(OMultiCollectionIterator.java:78)
>>>>at com.lusidity.mind.model.Node.getLinks(Node.java:308)
>>>>at com.lusidity.mind.model.Node.hasLink(Node.java:435)
>>>>at 

Re: [orientdb] Indices and Memory Usage

2017-05-18 Thread John J. Szucs
Andrey,

Why would I set MacCirectMemory to 512g when I only have 32GB in the VM? 
Wouldn't that just lead to massive swapping?

---
John J. Szucs (on my iPhone)

> On May 18, 2017, at 02:55, Andrey Lomakin <lomakin.and...@gmail.com> wrote:
> 
> Hi John,
> 
> I suppose you did not set -XX:MaxDirectMemorySize=512g parameter.
> 
> 
>> On Wed, May 17, 2017 at 7:07 PM John J. Szucs <john.j.sz...@gmail.com> wrote:
>> After 70 hours on a 32GB VM, ODB 2.2.20, JRE 8u131, the job failed with a 
>> direct buffer memory exception. Given the complications I mentioned above, 
>> my next step is going to be to get a high-RAM AWS EC2 instance and run this 
>> there. However, as I mentioned above, my leadership is getting frustrated 
>> with this situation.
>> 
>> -- John
>> 
>> 'Battle of banja luka'. 
>> com.orientechnologies.orient.core.exception.ODatabaseException: Error on 
>> retrieving record #63:19090001 (cluster: xlink_simple_2)
>> 
>>  DB name="kb"
>>  at 
>> com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:2050)
>>  at 
>> com.orientechnologies.orient.core.tx.OTransactionOptimistic.loadRecord(OTransactionOptimistic.java:187)
>>  at 
>> com.orientechnologies.orient.core.tx.OTransactionOptimistic.loadRecord(OTransactionOptimistic.java:162)
>>  at 
>> com.orientechnologies.orient.core.tx.OTransactionOptimistic.loadRecord(OTransactionOptimistic.java:291)
>>  at 
>> com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:1729)
>>  at 
>> com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:102)
>>  at 
>> com.orientechnologies.orient.core.id.ORecordId.getRecord(ORecordId.java:329)
>>  at 
>> com.tinkerpop.blueprints.impls.orient.OrientEdgeIterator.createGraphElement(OrientEdgeIterator.java:72)
>>  at 
>> com.tinkerpop.blueprints.impls.orient.OrientEdgeIterator.createGraphElement(OrientEdgeIterator.java:44)
>>  at 
>> com.orientechnologies.orient.core.iterator.OLazyWrapperIterator.hasNext(OLazyWrapperIterator.java:93)
>>  at 
>> com.orientechnologies.common.collection.OMultiCollectionIterator.hasNextInternal(OMultiCollectionIterator.java:97)
>>  at 
>> com.orientechnologies.common.collection.OMultiCollectionIterator.hasNext(OMultiCollectionIterator.java:78)
>>  at com.lusidity.mind.model.Node.getLinks(Node.java:308)
>>  at com.lusidity.mind.model.Node.hasLink(Node.java:435)
>>  at 
>> com.lusidity.mind.etl.providers.mediawiki.BaseMediaWikiPage.loadHyperlinks(BaseMediaWikiPage.java:401)
>>  at 
>> com.lusidity.mind.etl.providers.mediawiki.BaseMediaWikiPage.link(BaseMediaWikiPage.java:260)
>>  at 
>> com.lusidity.mind.etl.providers.mediawiki.BaseMediaWikiPage.load(BaseMediaWikiPage.java:240)
>>  at 
>> com.lusidity.mind.etl.providers.mediawiki.BaseMediaWikiPage.process(BaseMediaWikiPage.java:98)
>>  at 
>> com.lusidity.mind.etl.providers.mediawiki.ArticleHandler.process(ArticleHandler.java:113)
>>  at 
>> com.lusidity.mind.etl.providers.mediawiki.ArticleHandler.process(ArticleHandler.java:75)
>>  at info.bliki.wiki.dump.WikiXMLParser.endElement(WikiXMLParser.java:155)
>>  at 
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
>>  at 
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
>>  at 
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2967)
>>  at 
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
>>  at 
>> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
>>  at 
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
>>  at 
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
>>  at 
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
>>  at 
>> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
>>  at 
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)

Re: [orientdb] Indices and Memory Usage

2017-05-17 Thread John J. Szucs
cateDirect(ByteBuffer.java:311)
at
com.orientechnologies.common.directmemory.OByteBufferPool.allocateBuffer(OByteBufferPool.java:328)
at
com.orientechnologies.common.directmemory.OByteBufferPool.acquireDirect(OByteBufferPool.java:279)
at
com.orientechnologies.orient.core.storage.cache.local.OWOWCache.cacheFileContent(OWOWCache.java:1280)
at
com.orientechnologies.orient.core.storage.cache.local.OWOWCache.load(OWOWCache.java:656)
at
com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.updateCache(O2QCache.java:1102)
at
com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.doLoad(O2QCache.java:353)
at
com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.load(O2QCache.java:298)
at
com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.loadPage(ODurableComponent.java:148)
at
com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecordBuffer(OPaginatedCluster.java:691)
at
com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:667)
at
com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:646)
at
com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doReadRecord(OAbstractPaginatedStorage.java:3260)
at
com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:2879)
at
com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:1064)
at
com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx$SimpleRecordReader.readRecord(ODatabaseDocumentTx.java:3436)
at
com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:2012)
... 47 common frames omitted


On Tue, May 16, 2017 at 11:42 AM, John J. Szucs <john.j.sz...@gmail.com>
wrote:

> I've had some "complications" (namely, being hospitalized for a medical
> issue), but I am running the job right now with OrientDB 2.2.20 and JRE
> 8u131. It's only a 32GB VM for now, but it's almost 50% complete and the
> results are good so far.
>
> On Mon, May 15, 2017 at 10:29 AM, Claudio Massi <massi.clau...@gmail.com>
> wrote:
>
>> Hi John,
>>if you have 64gb ram, to avoid swapping jvm, try to keep process size
>> below 64gb, so use Xmx + MaxDirectMemorySize below the available ram
>>
>> Try orientdb 2.2.20 with java 8u131-b11 , if using G1GC
>>
>> Monitor heap usage with: jstat -gc  pid 120s 999
>>
>> Monitor direct memory usage with any jmx tool (see
>> http://andreylomakin.blogspot.it/2016/05/how-to-calculate-ma
>> ximum-amount-of.html )
>> - use jconsole, section MBeans, choose  
>> com.orientechnologies.common.directmemory
>> -> OByteBufferPoolMXBean -> Attribute
>> - use MonBuffers.java (Source from Alan B. in
>> https://gist.github.com/t3rmin4t0r/1a753ccdcfa8d111f07c  then increment
>> Thread.sleep(2000), and run adding tools.jar in classpath )
>> - use jmxterm (http://wiki.cyclopsgroup.org/jmxterm/)
>> ...
>>
>> Claudio
>>
>> Il giorno venerdì 5 maggio 2017 18:57:26 UTC+2, John J. Szucs ha scritto:
>>>
>>> Andrey,
>>>
>>> THANK YOU! I will give this a try as soon as I can.
>>>
>>> I will also do some JVM profi
>>>
>>> — John
>>>
>>> On May 5, 2017, at 05:05, Andrey Lomakin <lomakin...@gmail.com> wrote:
>>>
>>> Hi John,
>>> If you wish you could use this build till we will do official release
>>> https://drive.google.com/file/d/0B2oZq2xVp841T2diVGt
>>> TcmZ5OTQ/view?usp=sharing
>>>
>>> On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <lomakin...@gmail.com>
>>> wrote:
>>>
>>>> HI John,
>>>>
>>>> I suppose you encountered issue https://github.com/orien
>>>> technologies/orientdb/issues/7390
>>>> We will provide release soon.
>>>>
>>>> Also please do not use such huge heap size we use heap only to keep
>>>> temporary data, so I suggest you lower heap size to get ODB the chance to
>>>> use more direct memory.
>>>>
>>>> On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <
>>>> luigi.de...@gmail.com> wrote:
>>>>
>>>>> Hi John,
>>>>>
>>>>> How are you doing the import? Are you working in transaction? Some
>>>>> code will help us understand where the problem is
>>>>>
>>>>> Thanks
>>>>>
>>>>> Luigi
>>>>>
>

Re: [orientdb] Indices and Memory Usage

2017-05-16 Thread John J. Szucs
I've had some "complications" (namely, being hospitalized for a medical
issue), but I am running the job right now with OrientDB 2.2.20 and JRE
8u131. It's only a 32GB VM for now, but it's almost 50% complete and the
results are good so far.

On Mon, May 15, 2017 at 10:29 AM, Claudio Massi <massi.clau...@gmail.com>
wrote:

> Hi John,
>if you have 64gb ram, to avoid swapping jvm, try to keep process size
> below 64gb, so use Xmx + MaxDirectMemorySize below the available ram
>
> Try orientdb 2.2.20 with java 8u131-b11 , if using G1GC
>
> Monitor heap usage with: jstat -gc  pid 120s 999
>
> Monitor direct memory usage with any jmx tool (see
> http://andreylomakin.blogspot.it/2016/05/how-to-calculate-
> maximum-amount-of.html )
> - use jconsole, section MBeans, choose  
> com.orientechnologies.common.directmemory
> -> OByteBufferPoolMXBean -> Attribute
> - use MonBuffers.java (Source from Alan B. in https://gist.github.com/
> t3rmin4t0r/1a753ccdcfa8d111f07c  then increment Thread.sleep(2000), and
> run adding tools.jar in classpath )
> - use jmxterm (http://wiki.cyclopsgroup.org/jmxterm/)
> ...
>
> Claudio
>
> Il giorno venerdì 5 maggio 2017 18:57:26 UTC+2, John J. Szucs ha scritto:
>>
>> Andrey,
>>
>> THANK YOU! I will give this a try as soon as I can.
>>
>> I will also do some JVM profi
>>
>> — John
>>
>> On May 5, 2017, at 05:05, Andrey Lomakin <lomakin...@gmail.com> wrote:
>>
>> Hi John,
>> If you wish you could use this build till we will do official release
>> https://drive.google.com/file/d/0B2oZq2xVp841T2diVGt
>> TcmZ5OTQ/view?usp=sharing
>>
>> On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <lomakin...@gmail.com>
>> wrote:
>>
>>> HI John,
>>>
>>> I suppose you encountered issue https://github.com/orien
>>> technologies/orientdb/issues/7390
>>> We will provide release soon.
>>>
>>> Also please do not use such huge heap size we use heap only to keep
>>> temporary data, so I suggest you lower heap size to get ODB the chance to
>>> use more direct memory.
>>>
>>> On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <luigi.de...@gmail.com>
>>> wrote:
>>>
>>>> Hi John,
>>>>
>>>> How are you doing the import? Are you working in transaction? Some code
>>>> will help us understand where the problem is
>>>>
>>>> Thanks
>>>>
>>>> Luigi
>>>>
>>>>
>>>> 2017-05-05 3:53 GMT+02:00 John J. Szucs <john.j...@gmail.com>:
>>>>
>>>>> Hello, OrientDB community! It's me again with another question.
>>>>>
>>>>> I am still working on my project and have encountered another serious
>>>>> challenge: it seems that writing to indices (especially edge indices?) can
>>>>> cause OrientDB's direct (non-JVM) memory usage to grow without bounds 
>>>>> until
>>>>> the system effectively grinds to a halt due to swap.
>>>>>
>>>>> The specific use case is building a graph based on (English)
>>>>> Wikipedia. There are approximately 17.4M vertices representing pages
>>>>> (including articles, categories, and various meta pages). These vertices
>>>>> are connected by approximately 65M (at last count) edges. There are a few
>>>>> super-nodes. For example, the vertex representing https://en.wikipe
>>>>> dia.org/wiki/United_States has (at last count) 306K incoming edges
>>>>> and 822 outgoing edges. However, the degree of the vertices roughly 
>>>>> follows
>>>>> a Zipf distribution and the vast majority of vertices have only a few 
>>>>> (<10)
>>>>> total (in and out) edges. There are also some other vertex and edge types
>>>>> for lexical data, but I think those are secondary to the issue.
>>>>>
>>>>> Per previous discussion here and on StackOverflow, I have added
>>>>> automatic edge indices on in, out, or the composite of the two to optimize
>>>>> edge queries. When I run the process to extract, transform, and load the
>>>>> data from Wikipedia's XML dumps (using my own ETL code, not OrientDB's),
>>>>> after 24-48 hours, the Linux System Monitor shows that physical memory
>>>>> usage has reached 99.9% and then swap usage begins to grow. At this point,
>>>>> the process is effectively halted by swap thrashing.
>>>>>
>

Re: [orientdb] Indices and Memory Usage

2017-05-05 Thread John J. Szucs
Andrey,

THANK YOU! I will give this a try as soon as I can.

I will also do some JVM profi

— John

On May 5, 2017, at 05:05, Andrey Lomakin <lomakin.and...@gmail.com> wrote:

Hi John,
If you wish you could use this build till we will do official release
https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/
view?usp=sharing

On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <lomakin.and...@gmail.com>
wrote:

> HI John,
>
> I suppose you encountered issue https://github.com/
> orientechnologies/orientdb/issues/7390
> We will provide release soon.
>
> Also please do not use such huge heap size we use heap only to keep
> temporary data, so I suggest you lower heap size to get ODB the chance to
> use more direct memory.
>
> On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <
> luigi.dellaqu...@gmail.com> wrote:
>
>> Hi John,
>>
>> How are you doing the import? Are you working in transaction? Some code
>> will help us understand where the problem is
>>
>> Thanks
>>
>> Luigi
>>
>>
>> 2017-05-05 3:53 GMT+02:00 John J. Szucs <john.j.sz...@gmail.com>:
>>
>>> Hello, OrientDB community! It's me again with another question.
>>>
>>> I am still working on my project and have encountered another serious
>>> challenge: it seems that writing to indices (especially edge indices?) can
>>> cause OrientDB's direct (non-JVM) memory usage to grow without bounds until
>>> the system effectively grinds to a halt due to swap.
>>>
>>> The specific use case is building a graph based on (English) Wikipedia.
>>> There are approximately 17.4M vertices representing pages (including
>>> articles, categories, and various meta pages). These vertices are connected
>>> by approximately 65M (at last count) edges. There are a few super-nodes.
>>> For example, the vertex representing https://en.
>>> wikipedia.org/wiki/United_States has (at last count) 306K incoming
>>> edges and 822 outgoing edges. However, the degree of the vertices roughly
>>> follows a Zipf distribution and the vast majority of vertices have only a
>>> few (<10) total (in and out) edges. There are also some other vertex and
>>> edge types for lexical data, but I think those are secondary to the issue.
>>>
>>> Per previous discussion here and on StackOverflow, I have added
>>> automatic edge indices on in, out, or the composite of the two to optimize
>>> edge queries. When I run the process to extract, transform, and load the
>>> data from Wikipedia's XML dumps (using my own ETL code, not OrientDB's),
>>> after 24-48 hours, the Linux System Monitor shows that physical memory
>>> usage has reached 99.9% and then swap usage begins to grow. At this point,
>>> the process is effectively halted by swap thrashing.
>>>
>>> I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores
>>> allocated. The JVM settings are as follows:
>>>
>>> -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC
>>> -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false
>>>
>>> The MaxDirectMemorySize parameter is recommended by OrientDB itself,
>>> during start-up with the "out-of-memory errors" warning. It does seem odd
>>> to me that Xmx+MaxDirectMemorySize>available RAM, but I'm more of a
>>> deep R (not DevOps) guy, so I'm just accepting that unless someone
>>> advises me otherwise.
>>>
>>> If I disable the edge indices, then the process runs fine and completes
>>> in a "reasonable" (for it) amount of time: 2-3 days. Of course, if I do
>>> this, my run-time performance suffers intolerably.
>>>
>>> I am running this with OrientDB 2.2.19. I was able to quickly get my
>>> code to build with 3.0 M1, but some of the unit tests fail and I am under
>>> far too much pressure about this issue from my leadership to try to
>>> troubleshoot them right now.
>>>
>>> What can I do to solve this issue? Thanks in advance for your help!
>>>
>>> -- John
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to orient-database+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ---
>> You received this message because you are su

Re: [orientdb] Indices and Memory Usage

2017-05-05 Thread John J. Szucs
Andrey,

THANK YOU! I will give this a try as soon as I can.

I will also do some profiling to see where I really need my JVM heap size to be.

— John

> On May 5, 2017, at 05:05, Andrey Lomakin <lomakin.and...@gmail.com> wrote:
> 
> Hi John,
> If you wish you could use this build till we will do official release 
> https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing 
> <https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing>
>  
> 
> On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <lomakin.and...@gmail.com 
> <mailto:lomakin.and...@gmail.com>> wrote:
> HI John,
> 
> I suppose you encountered issue 
> https://github.com/orientechnologies/orientdb/issues/7390 
> <https://github.com/orientechnologies/orientdb/issues/7390> 
> We will provide release soon.
> 
> Also please do not use such huge heap size we use heap only to keep temporary 
> data, so I suggest you lower heap size to get ODB the chance to use more 
> direct memory.
> 
> On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <luigi.dellaqu...@gmail.com 
> <mailto:luigi.dellaqu...@gmail.com>> wrote:
> Hi John,
> 
> How are you doing the import? Are you working in transaction? Some code will 
> help us understand where the problem is
> 
> Thanks
> 
> Luigi
> 
> 
> 2017-05-05 3:53 GMT+02:00 John J. Szucs <john.j.sz...@gmail.com 
> <mailto:john.j.sz...@gmail.com>>:
> Hello, OrientDB community! It's me again with another question.
> 
> I am still working on my project and have encountered another serious 
> challenge: it seems that writing to indices (especially edge indices?) can 
> cause OrientDB's direct (non-JVM) memory usage to grow without bounds until 
> the system effectively grinds to a halt due to swap.
> 
> The specific use case is building a graph based on (English) Wikipedia. There 
> are approximately 17.4M vertices representing pages (including articles, 
> categories, and various meta pages). These vertices are connected by 
> approximately 65M (at last count) edges. There are a few super-nodes. For 
> example, the vertex representing https://en.wikipedia.org/wiki/United_States 
> <https://en.wikipedia.org/wiki/United_States> has (at last count) 306K 
> incoming edges and 822 outgoing edges. However, the degree of the vertices 
> roughly follows a Zipf distribution and the vast majority of vertices have 
> only a few (<10) total (in and out) edges. There are also some other vertex 
> and edge types for lexical data, but I think those are secondary to the issue.
> 
> Per previous discussion here and on StackOverflow, I have added automatic 
> edge indices on in, out, or the composite of the two to optimize edge 
> queries. When I run the process to extract, transform, and load the data from 
> Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 24-48 
> hours, the Linux System Monitor shows that physical memory usage has reached 
> 99.9% and then swap usage begins to grow. At this point, the process is 
> effectively halted by swap thrashing.
> 
> I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores 
> allocated. The JVM settings are as follows:
> 
> -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC 
> -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false
> 
> The MaxDirectMemorySize parameter is recommended by OrientDB itself, during 
> start-up with the "out-of-memory errors" warning. It does seem odd to me that 
> Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R (not 
> DevOps) guy, so I'm just accepting that unless someone advises me otherwise.
> 
> If I disable the edge indices, then the process runs fine and completes in a 
> "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, my 
> run-time performance suffers intolerably.
> 
> I am running this with OrientDB 2.2.19. I was able to quickly get my code to 
> build with 3.0 M1, but some of the unit tests fail and I am under far too 
> much pressure about this issue from my leadership to try to troubleshoot them 
> right now.
> 
> What can I do to solve this issue? Thanks in advance for your help!
> 
> -- John
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to orient-database+unsubscr...@googlegroups.com 
> <mailto:orient-database+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> 
> --- 
&g

[orientdb] Indices and Memory Usage

2017-05-04 Thread John J. Szucs
Hello, OrientDB community! It's me again with another question.

I am still working on my project and have encountered another serious 
challenge: it seems that writing to indices (especially edge indices?) can 
cause OrientDB's direct (non-JVM) memory usage to grow without bounds until 
the system effectively grinds to a halt due to swap.

The specific use case is building a graph based on (English) Wikipedia. 
There are approximately 17.4M vertices representing pages (including 
articles, categories, and various meta pages). These vertices are connected 
by approximately 65M (at last count) edges. There are a few super-nodes. 
For example, the vertex representing 
https://en.wikipedia.org/wiki/United_States has (at last count) 306K 
incoming edges and 822 outgoing edges. However, the degree of the vertices 
roughly follows a Zipf distribution and the vast majority of vertices have 
only a few (<10) total (in and out) edges. There are also some other vertex 
and edge types for lexical data, but I think those are secondary to the 
issue.

Per previous discussion here and on StackOverflow, I have added automatic 
edge indices on in, out, or the composite of the two to optimize edge 
queries. When I run the process to extract, transform, and load the data 
from Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 
24-48 hours, the Linux System Monitor shows that physical memory usage has 
reached 99.9% and then swap usage begins to grow. At this point, the 
process is effectively halted by swap thrashing.

I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores 
allocated. The JVM settings are as follows:

-Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC 
-XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false

The MaxDirectMemorySize parameter is recommended by OrientDB itself, during 
start-up with the "out-of-memory errors" warning. It does seem odd to me 
that Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R (not 
DevOps) guy, so I'm just accepting that unless someone advises me otherwise.

If I disable the edge indices, then the process runs fine and completes in 
a "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, 
my run-time performance suffers intolerably.

I am running this with OrientDB 2.2.19. I was able to quickly get my code 
to build with 3.0 M1, but some of the unit tests fail and I am under far 
too much pressure about this issue from my leadership to try to 
troubleshoot them right now.

What can I do to solve this issue? Thanks in advance for your help!

-- John

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [orientdb] Get Edges Between Vertices (Again)

2017-03-19 Thread John J. Szucs
Luigi,

To close this out, I implemented your Option #2 suggestion. I found at very 
good example 
at http://stackoverflow.com/questions/32953396/orientdb-edge-index-via-java 
and the results are absolutely spectacular!

Thanks for your help!

BTW: We will soon be launching our technology, which uses OrientDB for 
natural language processing and Semantic Web applications, on the Web. When 
we do, I will definitely write up a case study for you, Luca, and the other 
folks at OrientDB whose help has been so essential. And when we get 
investment or gain some traction (users, licensees, etc.) I look forward to 
buying a real license, support contract, etc. because that's a win-win for 
all of us!

-- John

On Thursday, March 16, 2017 at 9:46:57 AM UTC-4, Luigi Dell'Aquila wrote:
>
> Hi John,
>
> In MATCH statement (2.2) and more in general in 3.0 we are changing the 
> optimization of queries based on indexes, but you will still need an index 
> on the edge for such use case, so the big part of the work will be needed 
> anyway. 
>
> Thanks
>
> Luigi
>
>
> 2017-03-16 12:43 GMT+01:00 John J. Szucs <john.j...@gmail.com 
> >:
>
>> Luigi,
>>
>> Yes, this helps. Your option #2 is more applicable to my project.
>>
>> Will OrientDB 3.0 significantly change/improve this use case? I don't 
>> want to implement this manual edge index if it will become unnecessary in 
>> (a few?) weeks.
>>
>> Thanks!
>>
>> -- John
>>
>> On Mar 16, 2017, at 07:24, Luigi Dell'Aquila <luigi.de...@gmail.com 
>> > wrote:
>>
>> Hi John,
>>
>> you have two alternatives:
>>
>> 1) use OrientVertex.countEdges() to check which of the two vertices has a 
>> smaller number of edges. This approach is good if you know that at most one 
>> is a supernode
>>
>> 2) if you know that both vertices can be supernodes, then the only 
>> efficient way to find the edge is to define and index on edge(out, in) and 
>> do the indexed query directly
>>
>> I hope it helps
>>
>> Thanks
>>
>> Luigi
>>
>>
>> 2017-03-16 11:18 GMT+01:00 John J. Szucs <john.j...@gmail.com 
>> >:
>>
>>> I need to *very quickly* find the edges that directly connect two 
>>> specified vertices, using either a SQL query, the Java API, or a 
>>> combination of the two. If it helps, at this point in the program, I know 
>>> for a fact that the two vertices are adjacent. What I'm trying to determine 
>>> is *how* they are adjacent.
>>>
>>> The OrientVertex.getEdges(OrientVertex, Direction, String ...) extension 
>>> to the Blueprints API does what I need to do functionally, but it can be 
>>> quite slow if the first vertex has many edges. Looking into the source 
>>> code, I found that this is because this method essentially gets *all* of 
>>> the edges from the first vertex that match the direction and label criteria 
>>> and then checks if they are adjacent (connect to) the second vertex.
>>>
>>> I have struggled with this for days. Does anyone have a better/faster 
>>> approach?
>>>
>>> Thanks!
>>>
>>> -- John
>>>
>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to orient-databa...@googlegroups.com .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "OrientDB" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/orient-database/yyivlLcoS6A/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> orient-databa...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [orientdb] Get Edges Between Vertices (Again)

2017-03-16 Thread John J. Szucs
Luigi,

Yes, this helps. Your option #2 is more applicable to my project.

Will OrientDB 3.0 significantly change/improve this use case? I don't want to 
implement this manual edge index if it will become unnecessary in (a few?) 
weeks.

Thanks!

-- John

> On Mar 16, 2017, at 07:24, Luigi Dell'Aquila <luigi.dellaqu...@gmail.com> 
> wrote:
> 
> Hi John,
> 
> you have two alternatives:
> 
> 1) use OrientVertex.countEdges() to check which of the two vertices has a 
> smaller number of edges. This approach is good if you know that at most one 
> is a supernode
> 
> 2) if you know that both vertices can be supernodes, then the only efficient 
> way to find the edge is to define and index on edge(out, in) and do the 
> indexed query directly
> 
> I hope it helps
> 
> Thanks
> 
> Luigi
> 
> 
> 2017-03-16 11:18 GMT+01:00 John J. Szucs <john.j.sz...@gmail.com>:
>> I need to *very quickly* find the edges that directly connect two specified 
>> vertices, using either a SQL query, the Java API, or a combination of the 
>> two. If it helps, at this point in the program, I know for a fact that the 
>> two vertices are adjacent. What I'm trying to determine is how they are 
>> adjacent.
>> 
>> The OrientVertex.getEdges(OrientVertex, Direction, String ...) extension to 
>> the Blueprints API does what I need to do functionally, but it can be quite 
>> slow if the first vertex has many edges. Looking into the source code, I 
>> found that this is because this method essentially gets *all* of the edges 
>> from the first vertex that match the direction and label criteria and then 
>> checks if they are adjacent (connect to) the second vertex.
>> 
>> I have struggled with this for days. Does anyone have a better/faster 
>> approach?
>> 
>> Thanks!
>> 
>> -- John
>> -- 
>> 
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-database+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "OrientDB" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/orient-database/yyivlLcoS6A/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> orient-database+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[orientdb] Get Edges Between Vertices (Again)

2017-03-16 Thread John J. Szucs
I need to *very quickly* find the edges that directly connect two specified 
vertices, using either a SQL query, the Java API, or a combination of the 
two. If it helps, at this point in the program, I know for a fact that the 
two vertices are adjacent. What I'm trying to determine is *how* they are 
adjacent.

The OrientVertex.getEdges(OrientVertex, Direction, String ...) extension to 
the Blueprints API does what I need to do functionally, but it can be quite 
slow if the first vertex has many edges. Looking into the source code, I 
found that this is because this method essentially gets *all* of the edges 
from the first vertex that match the direction and label criteria and then 
checks if they are adjacent (connect to) the second vertex.

I have struggled with this for days. Does anyone have a better/faster 
approach?

Thanks!

-- John

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[orientdb] Counting Edges Between Vertices

2017-03-15 Thread John J. Szucs
In the spirit of giving back to the community, I am sharing a problem and 
solution that I just discovered in working with OrientDB.

For my application, I need to be able to *very* quickly determine how many 
(zero or more, often, but not always zero or one) edges with a given label 
and direction exist between two vertices.

The pure Java solution is OrientVertex.getEdges(otherVertex, direction, 
labels ...) but it returns an OMultiCollectionIterable of *all* the edges 
adjacent to the first vertex that match the label and direction criteria. 
Then it post-filters them for adjacency to the second vertex. This can be 
very slow if the first vertex has many edges that match the label and 
direction criteria.

As an alternative, if you only need the count (like I do) and need it to be 
as fast as possible, consider a SQL query like the following:

SELECT IN('myLabel')[@rid=:vertex2].SIZE() FROM :vertex1

Replace the IN function with BOTH or OUT as needed. The fragment above uses 
parameterized SQL, with the :vertex1 and :vertex2 parameters specifying the 
vertices of interest.

OrientDB team: Should I add to this to the "query cookbook" in the 
documentation?

-- John

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [orientdb] Optimizing Queries for a Prefix Tree

2016-12-23 Thread John J. Szucs
Luigi,

This reduced the overall run-time for the method that uses this query by 
about 75%!

As an FYI for future readers, I believe the second "WHERE" in suggested SQL 
statement should be an AND.

Thank you very much for the quick and helpful response!

-- John

On Friday, December 23, 2016 at 2:33:49 AM UTC-5, Luigi Dell'Aquila wrote:
>
> Hi John,
>
> You can try with the following:
>
> SELECT FROM YourClass where key = :k where in('Child') contains :p
>
> in a situation where you have supernodes it will be faster, but of course 
> it will be a bit slower when you have small fan out
>
> Thanks
>
> Luigi
>
>
> 2016-12-22 20:57 GMT+01:00 John J. Szucs <john.j...@gmail.com 
> >:
>
>> In my project, I have a fairly large prefix tree, potentially containing 
>> millions of nodes (about 250K nodes in my development instance), managed in 
>> OrientDB (pointing to other vertices in my graph).
>>
>> The nodes of the prefix tree are represented by a Token vertex type. Each 
>> Token has a 'key' property and is connected to its child vertices by a 
>> 'child' edge type. So, a sequence like "hello world" would be represented 
>> as:
>>
>> root -child-> "hello" -child-> "world"
>>
>>
>> Currently, I have a NOTUNIQUE_HASH_INDEX on Token.key and I am querying 
>> the data structure like this:
>>
>> SELECT EXPAND(OUT('child')[key=:k]) FROM :p
>>
>>
>> where *k* is the child key I am looking for and *p* is the RID of the 
>> parent node.
>>
>> Generally, performance is pretty good, but I am looking for ideas on 
>> improving the query, the indexing, or both for this use case. In 
>> particular, queries starting at the root node, which has many children, 
>> take noticeably longer than the other, less-connected nodes.
>>
>> Any suggestions? Thanks in advance!
>>
>> -- John
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[orientdb] Optimizing Queries for a Prefix Tree

2016-12-22 Thread John J. Szucs
In my project, I have a fairly large prefix tree, potentially containing 
millions of nodes (about 250K nodes in my development instance), managed in 
OrientDB (pointing to other vertices in my graph).

The nodes of the prefix tree are represented by a Token vertex type. Each 
Token has a 'key' property and is connected to its child vertices by a 
'child' edge type. So, a sequence like "hello world" would be represented 
as:

root -child-> "hello" -child-> "world"


Currently, I have a NOTUNIQUE_HASH_INDEX on Token.key and I am querying the 
data structure like this:

SELECT EXPAND(OUT('child')[key=:k]) FROM :p


where *k* is the child key I am looking for and *p* is the RID of the 
parent node.

Generally, performance is pretty good, but I am looking for ideas on 
improving the query, the indexing, or both for this use case. In 
particular, queries starting at the root node, which has many children, 
take noticeably longer than the other, less-connected nodes.

Any suggestions? Thanks in advance!

-- John

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [orientdb] Re: ORecordDuplicatedException with UNIQUE_HASH_INDEX and collate=ci

2016-10-01 Thread John J. Szucs
Bump.

Any further thoughts or progress on this topic?

The workarounds for this issue are spreading through my code base like an 
infection and I'd like to be able to cure the disease instead of the 
symptoms.

-- John

On Tuesday, September 6, 2016 at 2:29:24 PM UTC-4, John J. Szucs wrote:
>
> Andrey,
>
> Sorry for the delayed response. I had to focus on a milestone, which was 
> followed by a holiday weekend here in the US.
>
> The exception is thrown from OIndexUnique.put(Object, OIdentifiable). In 
> this particular test run, the existing record ID (in the variable "value" 
> in this method) is #85:12. The new record ID is (in the variable 
> "iSingleValue") is #100:11.
>
> Both of those records have a value of 
> https://en.wikipedia.org/wiki/fédération_anarchiste 
> <https://www.google.com/url?q=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2Ff%25C3%25A9d%25C3%25A9ration_anarchiste=D=1=AFQjCNGRELH6ujBCuKU1r0vOaDIUgCCARA>
>  for 
> Identifier.identifier.
>
> In this area of the code, there is a statement that checks for a 
> "mergeKeys" property in the index's metadata, but the metadata is null when 
> this happens.
>
> Looking at the problem from another angle, this problem occurs in the 
> context of a fairly large transaction. As you may have gathered, I am 
> ingesting data from Wikipedia (or other MediaWiki-based wikis). Each page 
> and all of its links (specifically, hyperlinks and narrower/broader 
> category links) is processed in a single transaction.
>
> Often (especially early in an import, for obvious reasons) those links 
> refer to other pages which I have not yet ingested, so I create a stub 
> Identifier vertex for them. In this particular example case, I have created 
> such a stub Identifier vertex for the URI in question *in the scope of a 
> still-pending transaction*.
>
> The stack trace also seems to suggest that this may be related to the 
> transaction context because what the app is actually trying to do is just 
> create an edge between two vertices that, as far as the app is concerned, 
> already exist. Looking at the stack trace below, though, you can see that 
> this makes OrientDB try to commit a pending index transaction that, for 
> some reason, duplicates an existing index entry.
>
> com.orientechnologies.orient.core.storage.ORecordDuplicatedException: 
> Cannot index record #100:11: found duplicated key '
> https://en.wikipedia.org/wiki/f%c3%a9d%c3%a9ration_anarchiste' in index 
> 'Identifier.identifier' previously assigned to the record #85:12
> DB name="kb" INDEX=Identifier.identifier RID=#85:12
> at 
> com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:64)
> at 
> com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:34)
> at 
> com.orientechnologies.orient.core.index.OIndexAbstract.putInSnapshot(OIndexAbstract.java:930)
> at 
> com.orientechnologies.orient.core.index.OIndexAbstract.applyIndexTxEntry(OIndexAbstract.java:762)
> at 
> com.orientechnologies.orient.core.index.OIndexAbstract.addTxOperation(OIndexAbstract.java:735)
> at 
> com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:1499)
> at 
> com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:1464)
> at 
> com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:566)
> at 
> com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:106)
> at 
> com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2733)
> at 
> com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.executeOutsideTx(OrientBaseGraph.java:1770)
> at 
> com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1434)
> at 
> com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1385)
> at 
> com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1360)
> at 
> com.tinkerpop.blueprints.impls.orient.OrientGraph.addEdgeInternal(OrientGraph.java:318)
> at 
> com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:717)
> at 
> com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:656)
> ...  ...
>
> BTW, I upgraded to OrientDB 2.2.9 before running the latest tests, just to 
> be current. There is no change in the behavior regarding this issue.
>
> Is this input helpful? Do you have any further insights as to fix or 
> work-around?
>
> -- John
>
> On Thursday, September 1, 2016 at 11:34:27 AM UTC-4, John 

Re: [orientdb] Re: ORecordDuplicatedException with UNIQUE_HASH_INDEX and collate=ci

2016-09-06 Thread John J. Szucs
Andrey,

Sorry for the delayed response. I had to focus on a milestone, which was 
followed by a holiday weekend here in the US.

The exception is thrown from OIndexUnique.put(Object, OIdentifiable). In 
this particular test run, the existing record ID (in the variable "value" 
in this method) is #85:12. The new record ID is (in the variable 
"iSingleValue") is #100:11.

Both of those records have a value of 
https://en.wikipedia.org/wiki/fédération_anarchiste 
<https://www.google.com/url?q=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2Ff%25C3%25A9d%25C3%25A9ration_anarchiste=D=1=AFQjCNGRELH6ujBCuKU1r0vOaDIUgCCARA>
 for 
Identifier.identifier.

In this area of the code, there is a statement that checks for a 
"mergeKeys" property in the index's metadata, but the metadata is null when 
this happens.

Looking at the problem from another angle, this problem occurs in the 
context of a fairly large transaction. As you may have gathered, I am 
ingesting data from Wikipedia (or other MediaWiki-based wikis). Each page 
and all of its links (specifically, hyperlinks and narrower/broader 
category links) is processed in a single transaction.

Often (especially early in an import, for obvious reasons) those links 
refer to other pages which I have not yet ingested, so I create a stub 
Identifier vertex for them. In this particular example case, I have created 
such a stub Identifier vertex for the URI in question *in the scope of a 
still-pending transaction*.

The stack trace also seems to suggest that this may be related to the 
transaction context because what the app is actually trying to do is just 
create an edge between two vertices that, as far as the app is concerned, 
already exist. Looking at the stack trace below, though, you can see that 
this makes OrientDB try to commit a pending index transaction that, for 
some reason, duplicates an existing index entry.

com.orientechnologies.orient.core.storage.ORecordDuplicatedException: 
Cannot index record #100:11: found duplicated key 
'https://en.wikipedia.org/wiki/f%c3%a9d%c3%a9ration_anarchiste' in index 
'Identifier.identifier' previously assigned to the record #85:12
DB name="kb" INDEX=Identifier.identifier RID=#85:12
at 
com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:64)
at 
com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:34)
at 
com.orientechnologies.orient.core.index.OIndexAbstract.putInSnapshot(OIndexAbstract.java:930)
at 
com.orientechnologies.orient.core.index.OIndexAbstract.applyIndexTxEntry(OIndexAbstract.java:762)
at 
com.orientechnologies.orient.core.index.OIndexAbstract.addTxOperation(OIndexAbstract.java:735)
at 
com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:1499)
at 
com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:1464)
at 
com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:566)
at 
com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:106)
at 
com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2733)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.executeOutsideTx(OrientBaseGraph.java:1770)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1434)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1385)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1360)
at 
com.tinkerpop.blueprints.impls.orient.OrientGraph.addEdgeInternal(OrientGraph.java:318)
at 
com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:717)
at 
com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:656)
...  ...

BTW, I upgraded to OrientDB 2.2.9 before running the latest tests, just to 
be current. There is no change in the behavior regarding this issue.

Is this input helpful? Do you have any further insights as to fix or 
work-around?

-- John

On Thursday, September 1, 2016 at 11:34:27 AM UTC-4, John J. Szucs wrote:
>
> Andrey,
>
> I am up against a deadline today (using my case-folding work-around for 
> now) and time zone differences are working against us.I will get back to 
> you with the results of this test tomorrow or over the weekend.
>
> Thanks for your patience!
>
> -- John
>
> On Thursday, September 1, 2016 at 5:41:42 AM UTC-4, Andrey Lomakin wrote:
>>
>> Hi John,
>>
>> Strange issue. 
>> Could you do following:
>>
>> 1. Get the source code of a database.
>> 2. Set breakpoint on ORecordDuplicatedException and check values of new 
>> and existing records when  exception is going to be thrown
>>
>> WDYT ?
&

Re: [orientdb] Re: ORecordDuplicatedException with UNIQUE_HASH_INDEX and collate=ci

2016-09-01 Thread John J. Szucs
Andrey,

I am up against a deadline today (using my case-folding work-around for 
now) and time zone differences are working against us.I will get back to 
you with the results of this test tomorrow or over the weekend.

Thanks for your patience!

-- John

On Thursday, September 1, 2016 at 5:41:42 AM UTC-4, Andrey Lomakin wrote:
>
> Hi John,
>
> Strange issue. 
> Could you do following:
>
> 1. Get the source code of a database.
> 2. Set breakpoint on ORecordDuplicatedException and check values of new 
> and existing records when  exception is going to be thrown
>
> WDYT ?
>
>
> On Wed, Aug 31, 2016 at 6:39 PM John J. Szucs <john.j...@gmail.com 
> > wrote:
>
>> Andrey,
>>
>> Thanks for responding.
>>
>> The RID in question changes every time I run this test case. Here are 
>> some results with my current run. The way that my environment is set-up, I 
>> can't really run the OrientDB console or Studio tool, so I wrote a little 
>> "db" command in my app that allows me to execute SQL commands for 
>> testing/debugging. You can see this being used below.
>>
>> com.orientechnologies.orient.core.storage.ORecordDuplicatedException: 
>> Cannot index record #100:14: found duplicated key '
>> https://en.wikipedia.org/wiki/fédération_anarchiste' in index 
>> 'Identifier.identifier' previously assigned to the record #85:13
>> ...
>> db "select * from #85:13"
>> 0 results. 
>>
>>  
>>
>> db "select * from #100:14"
>> 0 results.
>>
>>
>> db "select * from Identifier"
>> Identifier#81:0{identifier:
>> https://en.wikipedia.org/wiki/AccessibleComputing,out_id:[size=1]} v2
>> Identifier#82:0{identifier:
>> https://en.wikipedia.org/wiki/Computer_accessibility,out_id:[size=1]} v1
>> Identifier#83:0{identifier:
>> https://en.wikipedia.org/wiki/Anarchism,out_id:[size=1]} v1
>> Identifier#84:0{identifier:
>> https://en.wikipedia.org/wiki/political_philosophy,out_id:[size=1]} v1
>> Identifier#85:0{identifier:
>> https://en.wikipedia.org/wiki/AfghanistanHistory,out_id:[size=1]} v1
>> Identifier#86:0{identifier:
>> https://en.wikipedia.org/wiki/History_of_Afghanistan,out_id:[size=1]} v1
>> 6 results.
>>
>>
>> Note that neither #85:13 nor #100:14 appears to have actually been 
>> committed to the database.
>>
>> -- John
>>
>> On Wednesday, August 31, 2016 at 5:43:04 AM UTC-4, Andrey Lomakin wrote:
>>
>>> Hi John,
>>>
>>> Could you send us content of record with rid #109:13 (value of indexed 
>>> field will be enough I think) ?
>>>
>>> On Tue, Aug 30, 2016 at 7:02 PM John J. Szucs <john.j...@gmail.com> 
>>> wrote:
>>>
>> Thanks for pointing that out. I double-checked the actual code and it is 
>>>> using the correct "collate"="ci" Parameter pair for the Java API.
>>>>
>>>>
>>>> On Tuesday, August 30, 2016 at 11:54:21 AM UTC-4, alessand...@gmail.com 
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>> this will not solve the problem but I think the correct command for 
>>>>> creating an index with case insensitive is collate
>>>>>
>>>>> CREATE INDEX  [ON  (prop-names [COLLATE ])] 
>>>>>  [] [METADATA Metadata Document} {JSON Index]
>>>>>
>>>>> Example:
>>>>>
>>>>> create index User.name on User (name collate ci) UNIQUE
>>>>>
>>>>> Kind regards,
>>>>> Alessandro
>>>>>
>>>> -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OrientDB" group.
>>>>
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to orient-databa...@googlegroups.com.
>>>
>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>>> Best regards,
>>> Andrey Lomakin, R lead. 
>>> OrientDB Ltd
>>>
>>> twitter: @Andrey_Lomakin 
>>> linkedin: https://ua.linkedin.com/in/andreylomakin
>>> blogger: http://andreylomakin.blogspot.com/ 
>>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
> Best regards,
> Andrey Lomakin, R lead. 
> OrientDB Ltd
>
> twitter: @Andrey_Lomakin 
> linkedin: https://ua.linkedin.com/in/andreylomakin
> blogger: http://andreylomakin.blogspot.com/ 
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [orientdb] Re: ORecordDuplicatedException with UNIQUE_HASH_INDEX and collate=ci

2016-08-31 Thread John J. Szucs
Andrey,

Thanks for responding.

The RID in question changes every time I run this test case. Here are some 
results with my current run. The way that my environment is set-up, I can't 
really run the OrientDB console or Studio tool, so I wrote a little "db" 
command in my app that allows me to execute SQL commands for 
testing/debugging. You can see this being used below.

com.orientechnologies.orient.core.storage.ORecordDuplicatedException: 
Cannot index record #100:14: found duplicated key 
'https://en.wikipedia.org/wiki/fédération_anarchiste' in index 
'Identifier.identifier' previously assigned to the record #85:13
...
db "select * from #85:13"
0 results. 

 

db "select * from #100:14"
0 results.


db "select * from Identifier"
Identifier#81:0{identifier:https://en.wikipedia.org/wiki/AccessibleComputing,out_id:[size=1]}
 
v2
Identifier#82:0{identifier:https://en.wikipedia.org/wiki/Computer_accessibility,out_id:[size=1]}
 
v1
Identifier#83:0{identifier:https://en.wikipedia.org/wiki/Anarchism,out_id:[size=1]}
 
v1
Identifier#84:0{identifier:https://en.wikipedia.org/wiki/political_philosophy,out_id:[size=1]}
 
v1
Identifier#85:0{identifier:https://en.wikipedia.org/wiki/AfghanistanHistory,out_id:[size=1]}
 
v1
Identifier#86:0{identifier:https://en.wikipedia.org/wiki/History_of_Afghanistan,out_id:[size=1]}
 
v1
6 results.


Note that neither #85:13 nor #100:14 appears to have actually been 
committed to the database.

-- John

On Wednesday, August 31, 2016 at 5:43:04 AM UTC-4, Andrey Lomakin wrote:

> Hi John,
>
> Could you send us content of record with rid #109:13 (value of indexed 
> field will be enough I think) ?
>
> On Tue, Aug 30, 2016 at 7:02 PM John J. Szucs <john.j...@gmail.com 
> > wrote:
>
>> Thanks for pointing that out. I double-checked the actual code and it is 
>> using the correct "collate"="ci" Parameter pair for the Java API.
>>
>>
>> On Tuesday, August 30, 2016 at 11:54:21 AM UTC-4, alessand...@gmail.com 
>> wrote:
>>>
>>> Hi,
>>> this will not solve the problem but I think the correct command for 
>>> creating an index with case insensitive is collate
>>>
>>> CREATE INDEX  [ON  (prop-names [COLLATE ])] 
>>>  [] [METADATA Metadata Document} {JSON Index]
>>>
>>> Example:
>>>
>>> create index User.name on User (name collate ci) UNIQUE
>>>
>>> Kind regards,
>>> Alessandro
>>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
> Best regards,
> Andrey Lomakin, R lead. 
> OrientDB Ltd
>
> twitter: @Andrey_Lomakin 
> linkedin: https://ua.linkedin.com/in/andreylomakin
> blogger: http://andreylomakin.blogspot.com/ 
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[orientdb] Re: ORecordDuplicatedException with UNIQUE_HASH_INDEX and collate=ci

2016-08-30 Thread John J. Szucs
Thanks for pointing that out. I double-checked the actual code and it is 
using the correct "collate"="ci" Parameter pair for the Java API.

On Tuesday, August 30, 2016 at 11:54:21 AM UTC-4, alessand...@gmail.com 
wrote:
>
> Hi,
> this will not solve the problem but I think the correct command for 
> creating an index with case insensitive is collate
>
> CREATE INDEX  [ON  (prop-names [COLLATE ])] 
>  [] [METADATA Metadata Document} {JSON Index]
>
> Example:
>
> create index User.name on User (name collate ci) UNIQUE
>
> Kind regards,
> Alessandro
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[orientdb] ORecordDuplicatedException with UNIQUE_HASH_INDEX and collate=ci

2016-08-30 Thread John J. Szucs
I have an OrientDB application using the Java (Tinkerpop) API against 
OrientDB 2.2.3 running on Linux. While importing data from MediaWiki XML 
dumps (e.g., Wikipedia), I need to do an INSERT-IF-NOT-EXISTS type 
operation. I ran into an issue earlier and it appeared to be working 
beautifully after some help from Luca.

After working with the data for a little while, I discovered that my 
identifiers (which are actually URIs, like 
https://en.wikipedia.org/wiki/OrientDB) need to be indexed in a 
case-insensitive manner, because MediaWiki is inconsistent (or "flexible," 
depending on your perspective) about casing in URIs.

So, I modified the creation of my UNIQUE_HASH_MAP index to include the 
"collation=ci" parameter.

Now, as I am parsing the MediaWiki XML and loading the pages (as vertices) 
and links (as edges) into an OrientDB-based graph, I get an 
ORecordDuplicatedException:

com.orientechnologies.orient.core.storage.ORecordDuplicatedException: 
Cannot index record #124:14: found duplicated key 
'https://en.wikipedia.org/wiki/fédération_anarchiste' in index 
'Identifier.identifier' previously assigned to the record #109:13
DB name="kb"INDEX=Identifier.identifier RID=#109:13
at 
com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:64)
at 
com.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:34)
at 
com.orientechnologies.orient.core.index.OIndexAbstract.putInSnapshot(OIndexAbstract.java:911)
at 
com.orientechnologies.orient.core.index.OIndexAbstract.applyIndexTxEntry(OIndexAbstract.java:756)
at 
com.orientechnologies.orient.core.index.OIndexAbstract.addTxOperation(OIndexAbstract.java:729)
at 
com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:1387)
at 
com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:1348)
at 
com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:555)
at 
com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:109)
at 
com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2665)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.executeOutsideTx(OrientBaseGraph.java:1824)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1481)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1424)
at 
com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.createEdgeType(OrientBaseGraph.java:1395)
at 
com.tinkerpop.blueprints.impls.orient.OrientGraph.addEdgeInternal(OrientGraph.java:318)
at 
com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:717)
at 
com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:656)

Here are some experiments that I have conducted and their results:

1) If I remove the "collation=ci" parameter from the index, the exception 
does not occur. Of course, the identifiers then become case-sensitive and I 
end up with multiple vertices for different casings of the same URI.

2) Although my example above shows a URI with accented characters, that 
appears to be coincidental. It is just the first URI in my data set that 
this problem happens to occur with. I have written unit tests around URIs 
containing accented characters (technically IRIs) and they all pass.

3) The results are exactly the same if I try this with the SB-Tree index 
(index type=UNIQUE instead of UNIQUE_HASH_MAP, collation=ci).

I would strongly prefer not to simply case-fold the URIs, because 
https://en.wikipedia.org/wiki/OrientDB is the correct, canonical English 
Wikipedia URI for OrientDB, https://en.wikipedia.org/orientdb is not.

Any suggestions on this? Am I doing something wrong? Is it a bug? Is there 
a work-around?

-- John

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [orientdb] Indexing and Queries with Java API

2016-08-18 Thread John J. Szucs


---
John J. Szucs (on my iPhone)

> On Aug 18, 2016, at 17:07, Luca Garulli <l.garu...@orientdb.com> wrote:
> 
> Cool you solved.
> 
> Anyway we have to improve the docs, because I'm sure many users just drop 
> OrientDB after the first problem and maybe it's something trivial like this 
> ;-)
> 
> 
> Best Regards,
> 
> Luca Garulli
> Founder & CEO
> OrientDB LTD
> 
> Want to share your opinion about OrientDB?
> Rate & review us at Gartner's Software Review
> 
> 
>> On 18 August 2016 at 15:29, John J. Szucs <john.j.sz...@gmail.com> wrote:
>> New issue opened at 
>> https://github.com/orientechnologies/orientdb/issues/6589.
>> 
>> BTW, the performance test results I shared yesterday where running under the 
>> debugger and with extensive instrumentation. Here are the "clean" results. 
>> Wow!
>> 
>> Created 1 entities in 00:00:05.840, 1712.33 per second
>> Retrieving 1 entities...
>> Retrieved 1 entities in 00:00:01.561, 6406.15 per second
>> Deleting 1 entities...
>> Deleted 1 entities in 00:00:01.960, 5102.04 per second
>> 
>> Thanks again!
>> 
>> -- John
>> 
>>> On Wednesday, August 17, 2016 at 11:55:51 PM UTC-4, l.garulli wrote:
>>> Hi John,
>>> 
>>> Happy to help. Yes, please, could you open a new issue for the 
>>> documentation?
>>> 
>>> Best Regards,
>>> 
>>> Luca Garulli
>>> Founder & CEO
>>> OrientDB LTD
>>> 
>>> Want to share your opinion about OrientDB?
>>> Rate & review us at Gartner's Software Review
>>> 
>>> 
>>>> On 17 August 2016 at 07:45, John J. Szucs <john.j...@gmail.com> wrote:
>>>> Luca,
>>>> 
>>>> I just tried this. The only change was:
>>>> Iterable vertices=graph.getVertices("identifier", myUriStr);
>>>> to:
>>>> 
>>>> Iterable vertices=graph.getVertices("Identifier.identifier", 
>>>> myUriStr);
>>>> 
>>>> The results speak for themselves:
>>>> 
>>>> Created 1 entities in 00:02:05.755, 79.52 per second
>>>> 
>>>> This is the kind of performance I was expecting!
>>>> 
>>>> Thank you!!!
>>>> 
>>>> I will note that this was a very subtle change. Essentially, it seems that 
>>>> for the graph API's getVertices() method to use the indices, the property 
>>>> names have to be qualified with the vertex type name. Would you like for 
>>>> me to add an issue on GitHub to improve the documentation around this?
>>>> 
>>>> Thanks again!
>>>> 
>>>> -- John
>>>> 
>>>>> On Tuesday, August 16, 2016 at 7:01:10 PM UTC-4, l.garulli wrote:
>>>>> It looks like you're not using the index from the Graph API. Look at the 
>>>>> documentation:
>>>>> 
>>>>> http://orientdb.com/docs/last/Performance-Tuning-Graph.html#use-indexes-to-lookup-vertices-by-an-id
>>>>> 
>>>>> If it's not clear, please write here again, we will help you on this ;-)
>>>>> 
>>>>> Best Regards,
>>>>> 
>>>>> Luca Garulli
>>>>> Founder & CEO
>>>>> OrientDB LTD
>>>>> 
>>>>> Want to share your opinion about OrientDB?
>>>>> Rate & review us at Gartner's Software Review
>>>>> 
>>>>> 
>>>>>> On 16 August 2016 at 17:26, John J. Szucs <john.j...@gmail.com> wrote:
>>>>>> In my OrientDB-based application, I need to do an INSERT-IF-NOT-EXISTS 
>>>>>> operation using the Java (TinkerPop) API.
>>>>>> 
>>>>>> I have created a vertex type "Identifier." It has a single property, 
>>>>>> "identifier," which contains a URI (effectively a String for purposes of 
>>>>>> this discussion).
>>>>>> 
>>>>>> I have also created an index like this:
>>>>>> 
>>>>>> ParametersBuilder builder=new ParametersBuilder(); 
>>>>>> builder.add("class", "Identifier"); 
>>>>>> builder.add("type", "UNIQUE_HASH_INDEX");
>>>>>> graph.createKeyIndex("identifier", Vertex.class, builder.build());
>>>>>> 
>>>>>> Then, I perform the

Re: [orientdb] Indexing and Queries with Java API

2016-08-18 Thread John J. Szucs
New issue opened 
at https://github.com/orientechnologies/orientdb/issues/6589.

BTW, the performance test results I shared yesterday where running under 
the debugger and with extensive instrumentation. Here are the "clean" 
results. Wow!

Created 1 entities in 00:00:05.840, 1712.33 per second
Retrieving 1 entities...
Retrieved 1 entities in 00:00:01.561, 6406.15 per second
Deleting 1 entities...
Deleted 1 entities in 00:00:01.960, 5102.04 per second

Thanks again!

-- John

On Wednesday, August 17, 2016 at 11:55:51 PM UTC-4, l.garulli wrote:
>
> Hi John,
>
> Happy to help. Yes, please, could you open a new issue for the 
> documentation?
>
> Best Regards,
>
> Luca Garulli
> Founder & CEO
> OrientDB LTD <http://orientdb.com/>
>
> Want to share your opinion about OrientDB?
> Rate & review us at Gartner's Software Review 
> <https://www.gartner.com/reviews/survey/home>
>
>
> On 17 August 2016 at 07:45, John J. Szucs <john.j...@gmail.com 
> > wrote:
>
>> Luca,
>>
>> I just tried this. The only change was:
>>
>> Iterable vertices=graph.getVertices("identifier", myUriStr);
>>
>> to:
>>
>> Iterable vertices=graph.getVertices("Identifier.identifier", 
>> myUriStr);
>>
>>
>> The results speak for themselves:
>>
>> Created 1 entities in 00:02:05.755, 79.52 per second
>>
>>
>> This is the kind of performance I was expecting!
>>
>> Thank you!!!
>>
>> I will note that this was a very subtle change. Essentially, it seems 
>> that for the graph API's getVertices() method to use the indices, the 
>> property names have to be qualified with the vertex type name. Would you 
>> like for me to add an issue on GitHub to improve the documentation around 
>> this?
>>
>> Thanks again!
>>
>> -- John
>>
>> On Tuesday, August 16, 2016 at 7:01:10 PM UTC-4, l.garulli wrote:
>>>
>>> It looks like you're not using the index from the Graph API. Look at the 
>>> documentation:
>>>
>>>
>>> http://orientdb.com/docs/last/Performance-Tuning-Graph.html#use-indexes-to-lookup-vertices-by-an-id
>>>
>>> If it's not clear, please write here again, we will help you on this ;-)
>>>
>>> Best Regards,
>>>
>>> Luca Garulli
>>> Founder & CEO
>>> OrientDB LTD <http://orientdb.com/>
>>>
>>> Want to share your opinion about OrientDB?
>>> Rate & review us at Gartner's Software Review 
>>> <https://www.gartner.com/reviews/survey/home>
>>>
>>>
>>> On 16 August 2016 at 17:26, John J. Szucs <john.j...@gmail.com> wrote:
>>>
>>>> In my OrientDB-based application, I need to do an INSERT-IF-NOT-EXISTS 
>>>> operation using the Java (TinkerPop) API.
>>>>
>>>> I have created a vertex type "Identifier." It has a single property, 
>>>> "identifier," which contains a URI (effectively a String for purposes of 
>>>> this discussion).
>>>>
>>>> I have also created an index like this:
>>>>
>>>> ParametersBuilder builder=new ParametersBuilder(); 
>>>>
>>>> builder.add("class", "Identifier"); 
>>>>
>>>> builder.add("type", "UNIQUE_HASH_INDEX");
>>>>
>>>> graph.createKeyIndex("identifier", Vertex.class, builder.build());
>>>>
>>>>
>>>> Then, I perform the INSERT-IF-NOT-EXISTS operation in a loop like this. 
>>>> This snippet is using the Google Guava libraries and is obviously a 
>>>> simplification of our real application:
>>>>
>>>> int n=1;
>>>> for (int i=0; i<n; i++)
>>>> {
>>>>
>>>> String myUriStr="http://example.org/"+i.toString();
>>>>
>>>> Iterable vertices=graph.getVertices("identifier", myUriStr);
>>>>
>>>> Vertex vertex=Iterables.getOnlyElement(vertices);
>>>>
>>>> if (null==vertex)
>>>>
>>>> {
>>>>
>>>> // Create vertex
>>>>
>>>> ...
>>>>
>>>> }
>>>>
>>>> // Use vertex
>>>>
>>>> ...
>>>>
>>>> }
>>>>
>>>>
>>>> What I am seeing is that the throughput of this loop rapidly diminishes 
>>>> as more vertices are

Re: [orientdb] Indexing and Queries with Java API

2016-08-17 Thread John J. Szucs
Luca,

I just tried this. The only change was:

Iterable vertices=graph.getVertices("identifier", myUriStr);

to:

Iterable vertices=graph.getVertices("Identifier.identifier", 
myUriStr);


The results speak for themselves:

Created 1 entities in 00:02:05.755, 79.52 per second


This is the kind of performance I was expecting!

Thank you!!!

I will note that this was a very subtle change. Essentially, it seems that 
for the graph API's getVertices() method to use the indices, the property 
names have to be qualified with the vertex type name. Would you like for me 
to add an issue on GitHub to improve the documentation around this?

Thanks again!

-- John

On Tuesday, August 16, 2016 at 7:01:10 PM UTC-4, l.garulli wrote:
>
> It looks like you're not using the index from the Graph API. Look at the 
> documentation:
>
>
> http://orientdb.com/docs/last/Performance-Tuning-Graph.html#use-indexes-to-lookup-vertices-by-an-id
>
> If it's not clear, please write here again, we will help you on this ;-)
>
> Best Regards,
>
> Luca Garulli
> Founder & CEO
> OrientDB LTD <http://orientdb.com/>
>
> Want to share your opinion about OrientDB?
> Rate & review us at Gartner's Software Review 
> <https://www.gartner.com/reviews/survey/home>
>
>
> On 16 August 2016 at 17:26, John J. Szucs <john.j...@gmail.com 
> > wrote:
>
>> In my OrientDB-based application, I need to do an INSERT-IF-NOT-EXISTS 
>> operation using the Java (TinkerPop) API.
>>
>> I have created a vertex type "Identifier." It has a single property, 
>> "identifier," which contains a URI (effectively a String for purposes of 
>> this discussion).
>>
>> I have also created an index like this:
>>
>> ParametersBuilder builder=new ParametersBuilder(); 
>>
>> builder.add("class", "Identifier"); 
>>
>> builder.add("type", "UNIQUE_HASH_INDEX");
>>
>> graph.createKeyIndex("identifier", Vertex.class, builder.build());
>>
>>
>> Then, I perform the INSERT-IF-NOT-EXISTS operation in a loop like this. 
>> This snippet is using the Google Guava libraries and is obviously a 
>> simplification of our real application:
>>
>> int n=1;
>> for (int i=0; i<n; i++)
>> {
>>
>> String myUriStr="http://example.org/"+i.toString();
>>
>> Iterable vertices=graph.getVertices("identifier", myUriStr);
>>
>> Vertex vertex=Iterables.getOnlyElement(vertices);
>>
>> if (null==vertex)
>>
>> {
>>
>> // Create vertex
>>
>> ...
>>
>> }
>>
>> // Use vertex
>>
>> ...
>>
>> }
>>
>>
>> What I am seeing is that the throughput of this loop rapidly diminishes 
>> as more vertices are added, like this (with the throughput relative to the 
>> n=1,000 baseline):
>>
>>
>> n=1,000 throughput=100%
>> n=2,000 throughput=58.8%
>> n=5,000 throughput=29.7%
>>
>> n=10,000 throughput=16.5%
>>
>>
>> This obviously suggests that indexing is not working, so I tried a SQL 
>> EXPLAIN command.
>>
>> *explain select from identifier where identifier='http://example.org/1 
>> <http://example.org/1>'*
>> documentReads=1
>> fullySortedByIndex=false
>> documentAnalyzedCompatibleClass=1
>> recordReads=1
>> fetchingFromTargetElapsed=0
>> indexIsUsedInOrderBy=false
>> compositeIndexUsed=1
>> current=Identifier#153:0{identifier:http://example.org/1,out_id:[size=1]} 
>> v2
>> involvedIndexes=[Identifier.identifier]
>> limit=-1
>> evaluated=1
>> user=#5:0
>> elapsed=2.387001
>> resultType=collection
>> resultSize=1 
>>  
>>
>> The documentation at http://orientdb.com/docs/master/SQL-Explain.html does 
>> not seem to be 100% current on how to interpret the output of the EXPLAIN 
>> command, but my interpretation is that the query did recognize and use the 
>> index that I created.
>>
>> I also tried some profiling (with JProfiler) and see a hot spot 
>> at com.tinkerpop.blueprints.impls.orient.OrientElementIterator.hasNext.
>>
>> All of this is with OrientDB running in embedded mode, on a fairly 
>> high-end Linux machine and with a fresh, empty database at the beginning of 
>> each test.
>>
>> I have to believe I am doing something wrong to see such a rapid drop-off 
>> in query performance under such relatively small data volumes.
>>
>> I have been struggling with this for several days off-and-on now and it's 
>> time to ask for help. Has anyone else encountered a similar issue? What can 
>> I do to address this?
>>
>> Thanks in advance!
>>
>> -- John
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[orientdb] Indexing and Queries with Java API

2016-08-16 Thread John J. Szucs
In my OrientDB-based application, I need to do an INSERT-IF-NOT-EXISTS 
operation using the Java (TinkerPop) API.

I have created a vertex type "Identifier." It has a single property, 
"identifier," which contains a URI (effectively a String for purposes of 
this discussion).

I have also created an index like this:

ParametersBuilder builder=new ParametersBuilder(); 

builder.add("class", "Identifier"); 

builder.add("type", "UNIQUE_HASH_INDEX");

graph.createKeyIndex("identifier", Vertex.class, builder.build());


Then, I perform the INSERT-IF-NOT-EXISTS operation in a loop like this. 
This snippet is using the Google Guava libraries and is obviously a 
simplification of our real application:

int n=1;
for (int i=0; i

[orientdb] Re: Schema and Indexing with Java API

2015-01-28 Thread John J. Szucs
I did discover that documentation, but it doesn't seem comprehensive. For 
example, how does one create a case-insensitive index with the Java API? 
How does createKeyIndex relate to OIndexManager and its methods?

I can puzzle it out from the source code in some cases, but first-class 
documentation for the Java API would be very helpful.

On Tuesday, January 27, 2015 at 6:12:07 PM UTC-5, JR wrote:

 Hi John,

 I'm using this: 
 http://www.orientechnologies.com/docs/2.0/orientdb.wiki/Java-API.html

 hope this help,
 L


 On Tuesday, January 27, 2015 at 9:48:01 AM UTC-8, John J. Szucs wrote:

 Can anyone point me to good documentation on creating indices and other 
 schema definition tasks using the Java API? Most of the documentation 
 centers around SQL, but I would prefer to use the native Java API if at all 
 possible. The Java API includes methods that *seem to* expose indices 
 and other elements of the schema but I can't quite get them to work for me.

 We are using OrientDB 2.0, but forward-compatible documentation from 1.x 
 would also be helpful.

 Thanks in advance!

 -- John J. Szucs



-- 

--- 
You received this message because you are subscribed to the Google Groups 
OrientDB group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[orientdb] Schema and Indexing with Java API

2015-01-27 Thread John J. Szucs
Can anyone point me to good documentation on creating indices and other 
schema definition tasks using the Java API? Most of the documentation 
centers around SQL, but I would prefer to use the native Java API if at all 
possible. The Java API includes methods that *seem to* expose indices and 
other elements of the schema but I can't quite get them to work for me.

We are using OrientDB 2.0, but forward-compatible documentation from 1.x 
would also be helpful.

Thanks in advance!

-- John J. Szucs

-- 

--- 
You received this message because you are subscribed to the Google Groups 
OrientDB group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.