[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user asfgit closed the pull request at: https://github.com/apache/incubator-rya/pull/199 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user pujav65 commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132249498 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -53,8 +54,11 @@ public static final String OBJECT_TYPE_VALUE = XMLSchema.ANYURI.stringValue(); public static final String CONTEXT = "context"; public static final String PREDICATE = "predicate"; -public static final String OBJECT = "object"; +public static final String PREDICATE_HASH = "predicate_hash"; +public static final String OBJECT = "object_original"; --- End diff -- I'm pretty sure Mongo further condenses the data, so I'm not sure hashing is necessary in order for it to store in memory. You're adding a lot of overhead to query. I'm ok with adding it now if you think it's necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132245115 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -64,14 +68,14 @@ @Override public void createIndices(final DBCollection coll){ BasicDBObject doc = new BasicDBObject(); -doc.put(SUBJECT, 1); -doc.put(PREDICATE, 1); +doc.put(SUBJECT_HASH, 1); +doc.put(PREDICATE_HASH, 1); coll.createIndex(doc); --- End diff -- @pujav65 thanks. @isper3at clearly a bug. please add OBJECT_HASH, OBJECT_TYPE_HASH to the first index. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user pujav65 commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132243060 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -64,14 +68,14 @@ @Override public void createIndices(final DBCollection coll){ BasicDBObject doc = new BasicDBObject(); -doc.put(SUBJECT, 1); -doc.put(PREDICATE, 1); +doc.put(SUBJECT_HASH, 1); +doc.put(PREDICATE_HASH, 1); coll.createIndex(doc); --- End diff -- When the Mongo db backend was first implemented, you could only do indices over two fields-- the first is the primary index, the second the secondary index. That may have changed since. The indices we originally had were subject, predicate, object, and then subject/predicate, predicate/object, and object/subject. The not including object type might be a bug, but I had thought that was addressed at some point. Also one could argue that the single field indices were redundant-- I had wanted to test to see but never got around to it. If you can now index over more than two fields, then we might want to revisit this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132242206 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -53,8 +54,11 @@ public static final String OBJECT_TYPE_VALUE = XMLSchema.ANYURI.stringValue(); public static final String CONTEXT = "context"; public static final String PREDICATE = "predicate"; -public static final String OBJECT = "object"; +public static final String PREDICATE_HASH = "predicate_hash"; +public static final String OBJECT = "object_original"; --- End diff -- @pujav65 I'm concerned about index size. please hash everything. If you want another ticket for "please hash everything" I'm fine with that, but let's knock that out while @isper3at is cleaning this stuff up. Key thing with mongo is to get the index to fit in memory, so lets do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132235261 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -64,14 +68,14 @@ @Override public void createIndices(final DBCollection coll){ BasicDBObject doc = new BasicDBObject(); -doc.put(SUBJECT, 1); -doc.put(PREDICATE, 1); +doc.put(SUBJECT_HASH, 1); +doc.put(PREDICATE_HASH, 1); coll.createIndex(doc); -doc = new BasicDBObject(PREDICATE, 1); -doc.put(OBJECT, 1); +doc = new BasicDBObject(PREDICATE_HASH, 1); +doc.put(OBJECT_HASH, 1); doc.put(OBJECT_TYPE, 1); coll.createIndex(doc); -doc = new BasicDBObject(OBJECT, 1); +doc = new BasicDBObject(OBJECT_HASH, 1); doc.put(OBJECT_TYPE, 1); doc.put(SUBJECT, 1); --- End diff -- SUBJECT_HASH --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132235567 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -64,14 +68,14 @@ @Override public void createIndices(final DBCollection coll){ BasicDBObject doc = new BasicDBObject(); -doc.put(SUBJECT, 1); -doc.put(PREDICATE, 1); +doc.put(SUBJECT_HASH, 1); +doc.put(PREDICATE_HASH, 1); coll.createIndex(doc); --- End diff -- @pujav65 Looking over this index creation code... this seems like a bug... where's the SPO index? I think this first index should be SUBJECT_HASH, PREDICATE_HASH, OBJECT_HASH, OBJECT_TYPE_HASH --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132235041 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -53,8 +54,11 @@ public static final String OBJECT_TYPE_VALUE = XMLSchema.ANYURI.stringValue(); public static final String CONTEXT = "context"; public static final String PREDICATE = "predicate"; -public static final String OBJECT = "object"; +public static final String PREDICATE_HASH = "predicate_hash"; +public static final String OBJECT = "object_original"; --- End diff -- yep, might as well hash context and object type as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132234427 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -85,14 +89,14 @@ public DBObject getQuery(final RyaStatement stmt) { final RyaURI context = stmt.getContext(); final BasicDBObject query = new BasicDBObject(); if (subject != null){ -query.append(SUBJECT, subject.getData()); +query.append(SUBJECT_HASH, DigestUtils.sha256Hex(subject.getData())); --- End diff -- I'll do some testing on this, but I'm guessing PRO: (1) smaller index size and (2) smaller messages over the wire. CON: Need to take care when println'ing the query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user pujav65 commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132234315 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -53,8 +54,11 @@ public static final String OBJECT_TYPE_VALUE = XMLSchema.ANYURI.stringValue(); public static final String CONTEXT = "context"; public static final String PREDICATE = "predicate"; -public static final String OBJECT = "object"; +public static final String PREDICATE_HASH = "predicate_hash"; +public static final String OBJECT = "object_original"; --- End diff -- hey i don't think we need to hash predicates and subjects - just objects. objects are possibly literals which means they can have unspecified length (and in practice are likely to be very long -- sometimes people literally put books into comments which are object values). theoretically predicates and subjects are URIs which means that to be valid they are limited in length. no harm in doing it, it just adds a layer of indirection at query time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user isper3at commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132230655 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -85,14 +89,14 @@ public DBObject getQuery(final RyaStatement stmt) { final RyaURI context = stmt.getContext(); final BasicDBObject query = new BasicDBObject(); if (subject != null){ -query.append(SUBJECT, subject.getData()); +query.append(SUBJECT_HASH, DigestUtils.sha256Hex(subject.getData())); --- End diff -- I can store as either. Not really sure if there are any pros-cons between the two --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user isper3at commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132230502 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -53,8 +54,11 @@ public static final String OBJECT_TYPE_VALUE = XMLSchema.ANYURI.stringValue(); public static final String CONTEXT = "context"; public static final String PREDICATE = "predicate"; -public static final String OBJECT = "object"; +public static final String PREDICATE_HASH = "predicate_hash"; +public static final String OBJECT = "object_original"; --- End diff -- woops. I'll make it just object. did you want a hash for context as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132191684 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -53,8 +54,11 @@ public static final String OBJECT_TYPE_VALUE = XMLSchema.ANYURI.stringValue(); public static final String CONTEXT = "context"; public static final String PREDICATE = "predicate"; -public static final String OBJECT = "object"; +public static final String PREDICATE_HASH = "predicate_hash"; +public static final String OBJECT = "object_original"; --- End diff -- Can you change this to just "object" or change "context" "predicate" "subject" to "xxx_original" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
Github user amihalik commented on a diff in the pull request: https://github.com/apache/incubator-rya/pull/199#discussion_r132192953 --- Diff: dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java --- @@ -85,14 +89,14 @@ public DBObject getQuery(final RyaStatement stmt) { final RyaURI context = stmt.getContext(); final BasicDBObject query = new BasicDBObject(); if (subject != null){ -query.append(SUBJECT, subject.getData()); +query.append(SUBJECT_HASH, DigestUtils.sha256Hex(subject.getData())); --- End diff -- Can we store/query in binary (32 bytes) vs hex string (64 bytes)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string
GitHub user isper3at opened a pull request: https://github.com/apache/incubator-rya/pull/199 RYA-316 Long OBJ string ## Description >What Changed? Hash the indexed object field with SHA256. This will allow the indexer not to break if the object is longer than 1024 bytes. ### Tests >Coverage? Updated the tests with the new fields ### Links [Jira](https://issues.apache.org/jira/browse/RYA-316) ### Checklist - [ ] Code Review - [ ] Squash Commits People To Reivew @meiercaleb @amihalik @pujav65 You can merge this pull request into a Git repository by running: $ git pull https://github.com/isper3at/incubator-rya RYA-316 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-rya/pull/199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #199 commit 94f6c9bad01d0cf7c2716895678692065796fa13 Author: isper3at Date: 2017-08-07T17:28:46Z RYA-316 Long OBJ string Hash the indexed object field with SHA256. This will allow the indexer not to break if the object is longer than 1024 bytes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---