[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-rya/pull/199


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread pujav65
Github user pujav65 commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132249498
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -53,8 +54,11 @@
 public static final String OBJECT_TYPE_VALUE = 
XMLSchema.ANYURI.stringValue();
 public static final String CONTEXT = "context";
 public static final String PREDICATE = "predicate";
-public static final String OBJECT = "object";
+public static final String PREDICATE_HASH = "predicate_hash";
+public static final String OBJECT = "object_original";
--- End diff --

I'm pretty sure Mongo further condenses the data, so I'm not sure hashing 
is necessary in order for it to store in memory.  You're adding a lot of 
overhead to query.  I'm ok with adding it now if you think it's necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread amihalik
Github user amihalik commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132245115
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -64,14 +68,14 @@
 @Override
 public void createIndices(final DBCollection coll){
 BasicDBObject doc = new BasicDBObject();
-doc.put(SUBJECT, 1);
-doc.put(PREDICATE, 1);
+doc.put(SUBJECT_HASH, 1);
+doc.put(PREDICATE_HASH, 1);
 coll.createIndex(doc);
--- End diff --

@pujav65 thanks.  

@isper3at clearly a bug.  please add OBJECT_HASH, OBJECT_TYPE_HASH to the 
first index. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread pujav65
Github user pujav65 commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132243060
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -64,14 +68,14 @@
 @Override
 public void createIndices(final DBCollection coll){
 BasicDBObject doc = new BasicDBObject();
-doc.put(SUBJECT, 1);
-doc.put(PREDICATE, 1);
+doc.put(SUBJECT_HASH, 1);
+doc.put(PREDICATE_HASH, 1);
 coll.createIndex(doc);
--- End diff --

When the Mongo db backend was first implemented, you could only do indices 
over two fields-- the first is the primary index, the second the secondary 
index.  That may have changed since.  The indices we originally had were 
subject, predicate, object, and then subject/predicate, predicate/object, and 
object/subject.  The not including object type might be a bug, but I had 
thought that was addressed at some point.  Also one could argue that the single 
field indices were redundant-- I had wanted to test to see but never got around 
to it.
If you can now index over more than two fields, then we might want to 
revisit this.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread amihalik
Github user amihalik commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132242206
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -53,8 +54,11 @@
 public static final String OBJECT_TYPE_VALUE = 
XMLSchema.ANYURI.stringValue();
 public static final String CONTEXT = "context";
 public static final String PREDICATE = "predicate";
-public static final String OBJECT = "object";
+public static final String PREDICATE_HASH = "predicate_hash";
+public static final String OBJECT = "object_original";
--- End diff --

@pujav65 I'm concerned about index size.  please hash everything.  If you 
want another ticket for "please hash everything" I'm fine with that, but let's 
knock that out while @isper3at is cleaning this stuff up.  Key thing with mongo 
is to get the index to fit in memory, so lets do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread amihalik
Github user amihalik commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132235261
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -64,14 +68,14 @@
 @Override
 public void createIndices(final DBCollection coll){
 BasicDBObject doc = new BasicDBObject();
-doc.put(SUBJECT, 1);
-doc.put(PREDICATE, 1);
+doc.put(SUBJECT_HASH, 1);
+doc.put(PREDICATE_HASH, 1);
 coll.createIndex(doc);
-doc = new BasicDBObject(PREDICATE, 1);
-doc.put(OBJECT, 1);
+doc = new BasicDBObject(PREDICATE_HASH, 1);
+doc.put(OBJECT_HASH, 1);
 doc.put(OBJECT_TYPE, 1);
 coll.createIndex(doc);
-doc = new BasicDBObject(OBJECT, 1);
+doc = new BasicDBObject(OBJECT_HASH, 1);
 doc.put(OBJECT_TYPE, 1);
 doc.put(SUBJECT, 1);
--- End diff --

SUBJECT_HASH


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread amihalik
Github user amihalik commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132235567
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -64,14 +68,14 @@
 @Override
 public void createIndices(final DBCollection coll){
 BasicDBObject doc = new BasicDBObject();
-doc.put(SUBJECT, 1);
-doc.put(PREDICATE, 1);
+doc.put(SUBJECT_HASH, 1);
+doc.put(PREDICATE_HASH, 1);
 coll.createIndex(doc);
--- End diff --

@pujav65  Looking over this index creation code... this seems like a bug... 
where's the SPO index?  I think this first index should be SUBJECT_HASH, 
PREDICATE_HASH, OBJECT_HASH, OBJECT_TYPE_HASH


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread amihalik
Github user amihalik commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132235041
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -53,8 +54,11 @@
 public static final String OBJECT_TYPE_VALUE = 
XMLSchema.ANYURI.stringValue();
 public static final String CONTEXT = "context";
 public static final String PREDICATE = "predicate";
-public static final String OBJECT = "object";
+public static final String PREDICATE_HASH = "predicate_hash";
+public static final String OBJECT = "object_original";
--- End diff --

yep, might as well hash context and object type as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread pujav65
Github user pujav65 commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132234315
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -53,8 +54,11 @@
 public static final String OBJECT_TYPE_VALUE = 
XMLSchema.ANYURI.stringValue();
 public static final String CONTEXT = "context";
 public static final String PREDICATE = "predicate";
-public static final String OBJECT = "object";
+public static final String PREDICATE_HASH = "predicate_hash";
+public static final String OBJECT = "object_original";
--- End diff --

hey i don't think we need to hash predicates and subjects - just objects.  
objects are possibly literals which means they can have unspecified length (and 
in practice are likely to be very long -- sometimes people literally put books 
into comments which are object values).  theoretically predicates and subjects 
are URIs which means that to be valid they are limited in length.  no harm in 
doing it, it just adds a layer of indirection at query time.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread isper3at
Github user isper3at commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132230655
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -85,14 +89,14 @@ public DBObject getQuery(final RyaStatement stmt) {
 final RyaURI context = stmt.getContext();
 final BasicDBObject query = new BasicDBObject();
 if (subject != null){
-query.append(SUBJECT, subject.getData());
+query.append(SUBJECT_HASH, 
DigestUtils.sha256Hex(subject.getData()));
--- End diff --

I can store as either.  Not really sure if there are any pros-cons between 
the two


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread isper3at
Github user isper3at commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132230502
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -53,8 +54,11 @@
 public static final String OBJECT_TYPE_VALUE = 
XMLSchema.ANYURI.stringValue();
 public static final String CONTEXT = "context";
 public static final String PREDICATE = "predicate";
-public static final String OBJECT = "object";
+public static final String PREDICATE_HASH = "predicate_hash";
+public static final String OBJECT = "object_original";
--- End diff --

woops.  I'll make it just object.  did you want a hash for context as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread amihalik
Github user amihalik commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132191684
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -53,8 +54,11 @@
 public static final String OBJECT_TYPE_VALUE = 
XMLSchema.ANYURI.stringValue();
 public static final String CONTEXT = "context";
 public static final String PREDICATE = "predicate";
-public static final String OBJECT = "object";
+public static final String PREDICATE_HASH = "predicate_hash";
+public static final String OBJECT = "object_original";
--- End diff --

Can you change this to just "object" or change "context" "predicate" 
"subject" to "xxx_original"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-09 Thread amihalik
Github user amihalik commented on a diff in the pull request:

https://github.com/apache/incubator-rya/pull/199#discussion_r132192953
  
--- Diff: 
dao/mongodb.rya/src/main/java/org/apache/rya/mongodb/dao/SimpleMongoDBStorageStrategy.java
 ---
@@ -85,14 +89,14 @@ public DBObject getQuery(final RyaStatement stmt) {
 final RyaURI context = stmt.getContext();
 final BasicDBObject query = new BasicDBObject();
 if (subject != null){
-query.append(SUBJECT, subject.getData());
+query.append(SUBJECT_HASH, 
DigestUtils.sha256Hex(subject.getData()));
--- End diff --

Can we store/query in binary (32 bytes) vs hex string (64 bytes)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-rya pull request #199: RYA-316 Long OBJ string

2017-08-07 Thread isper3at
GitHub user isper3at opened a pull request:

https://github.com/apache/incubator-rya/pull/199

RYA-316 Long OBJ string

## Description
>What Changed?

Hash the indexed object field with SHA256.
This will allow the indexer not to break
if the object is longer than 1024 bytes.

### Tests
>Coverage?

Updated the tests with the new fields

### Links
[Jira](https://issues.apache.org/jira/browse/RYA-316)

### Checklist
- [ ] Code Review
- [ ] Squash Commits

 People To Reivew
@meiercaleb 
@amihalik 
@pujav65 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/isper3at/incubator-rya RYA-316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-rya/pull/199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #199


commit 94f6c9bad01d0cf7c2716895678692065796fa13
Author: isper3at 
Date:   2017-08-07T17:28:46Z

RYA-316 Long OBJ string

Hash the indexed object field with SHA256.
This will allow the indexer not to break
if the object is longer than 1024 bytes.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---