[jira] [Comment Edited] (GORA-411) Add exists(key) to DataStore interface

2019-03-18 Thread John Mora (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795698#comment-16795698
 ] 

John Mora edited comment on GORA-411 at 3/19/19 5:24 AM:
-

Hi [~alfonso.nishikawa] , [~lewismc].

I would like to work on this issue as a warm up task for my GoSC2019 
application.

I added the method _*public*_ _*boolean exists(K key) throws GoraException*_ in 
the *DataStore* interface and implemented a default behavior in the 
*DataStoreBase* class as follows.
{code:java}
@Override
  public boolean exists(K key) throws GoraException {
    return get(key,new String [0])!=null;
  }
{code}
And, for testing I added the following case:
{code:java}
  public static void testExistsEmployee(DataStore dataStore)
    throws Exception {
    dataStore.createSchema();
    Employee employee = DataStoreTestUtil.createEmployee();
    String ssn = employee.getSsn().toString();
    dataStore.put(ssn, employee);
    dataStore.flush();   
    assertTrue(dataStore.exists(ssn));
    dataStore.delete(ssn);
    dataStore.flush();
    assertFalse(dataStore.exists(ssn));
  }{code}
It seems this naive approach works (tests are passing), so I think I could 
analyze every backend in order to find more adequate custom implementations for 
each one. But, I would like to know if the test case above is enough for this 
new method, do you know other edge cases that should be also checked?.

Cheers,

John

 

 


was (Author: jhnmora000):
Hi [~alfonso.nishikawa] , [~lewismc].


 I would like to work on this issue as a warm up task for my GoSC2019 
application.

I added the method _*public*_ _*boolean exists(K key) throws GoraException*_ in 
the *DataStore* interface and implemented a default behavior in the 
*DataStoreBase* class as follows.
{code:java}
@Override
  public boolean exists(K key) throws GoraException {
    return get(key,new String [0])!=null;
  }
{code}
And, for testing I added the following case:

 
{code:java}
  public static void testExistsEmployee(DataStore dataStore)
    throws Exception {
    dataStore.createSchema();
    Employee employee = DataStoreTestUtil.createEmployee();
    String ssn = employee.getSsn().toString();
    dataStore.put(ssn, employee);
    dataStore.flush();   
    assertTrue(dataStore.exists(ssn));
    dataStore.delete(ssn);
    dataStore.flush();
    assertFalse(dataStore.exists(ssn));
  }{code}
It seems this naive approach works (tests are passing), so I think I could 
analyze every backend in order to find more adequate custom implementations for 
each one. But, I would like to know if the test case above is enough for this 
new method, do you know other edge cases that should be also checked?.

 

 

Cheers,

John

 

 

> Add exists(key) to DataStore interface
> --
>
> Key: GORA-411
> URL: https://issues.apache.org/jira/browse/GORA-411
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-core, storage
>Reporter: Alfonso Nishikawa
>Priority: Minor
> Fix For: 0.9
>
>
> NUTCH-1679 need to check if there exists some rows and they are proposing to 
> use {{store.get(TableUtil.reverseUrl(url)))}}.
> This will have a considerably impact on performance since every column will 
> be fetched.
> Some datastores implements a call to just check if a row exists (like HBase) 
> so no data is transfered by network.
> If a datastore can't handle an "exists" call, can default to a get.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-411) Add exists(key) to DataStore interface

2019-03-18 Thread John Mora (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795698#comment-16795698
 ] 

John Mora commented on GORA-411:


Hi [~alfonso.nishikawa] , [~lewismc].


 I would like to work on this issue as a warm up task for my GoSC2019 
application.

I added the method _*public*_ _*boolean exists(K key) throws GoraException*_ in 
the *DataStore* interface and implemented a default behavior in the 
*DataStoreBase* class as follows.
{code:java}
@Override
  public boolean exists(K key) throws GoraException {
    return get(key,new String [0])!=null;
  }
{code}
And, for testing I added the following case:

 
{code:java}
  public static void testExistsEmployee(DataStore dataStore)
    throws Exception {
    dataStore.createSchema();
    Employee employee = DataStoreTestUtil.createEmployee();
    String ssn = employee.getSsn().toString();
    dataStore.put(ssn, employee);
    dataStore.flush();   
    assertTrue(dataStore.exists(ssn));
    dataStore.delete(ssn);
    dataStore.flush();
    assertFalse(dataStore.exists(ssn));
  }{code}
It seems this naive approach works (tests are passing), so I think I could 
analyze every backend in order to find more adequate custom implementations for 
each one. But, I would like to know if the test case above is enough for this 
new method, do you know other edge cases that should be also checked?.

 

 

Cheers,

John

 

 

> Add exists(key) to DataStore interface
> --
>
> Key: GORA-411
> URL: https://issues.apache.org/jira/browse/GORA-411
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-core, storage
>Reporter: Alfonso Nishikawa
>Priority: Minor
> Fix For: 0.9
>
>
> NUTCH-1679 need to check if there exists some rows and they are proposing to 
> use {{store.get(TableUtil.reverseUrl(url)))}}.
> This will have a considerably impact on performance since every column will 
> be fetched.
> Some datastores implements a call to just check if a row exists (like HBase) 
> so no data is transfered by network.
> If a datastore can't handle an "exists" call, can default to a get.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [gora] alfonsonishikawa commented on a change in pull request #135: Goraexplorer needed changes

2019-03-18 Thread GitBox
alfonsonishikawa commented on a change in pull request #135: Goraexplorer 
needed changes
URL: https://github.com/apache/gora/pull/135#discussion_r266680620
 
 

 ##
 File path: 
gora-pig/src/test/java/org/apache/gora/pig/GoraStorageTest.java-disabled
 ##
 @@ -0,0 +1,352 @@
+package org.apache.gora.pig;
 
 Review comment:
   > Were you able to try the same with HBase 2 upgrade?
   
   Hi! No. I will try, though. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [gora] djkevincr commented on issue #135: Goraexplorer needed changes

2019-03-18 Thread GitBox
djkevincr commented on issue #135: Goraexplorer needed changes
URL: https://github.com/apache/gora/pull/135#issuecomment-474034153
 
 
   @alfonsonishikawa One concern I do have is, I noticed record.vm velocity 
template changes, hopefully I think you have regenerated all the AVRO databean 
classes again to avoid any inconsistent updates, due to multiple updates to 
velocity template.
   This is really great :) as first step. We can continue this work, with the 
improvements you suggested to me offline. @lewismc Do you have any concern over 
your review on this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [gora] djkevincr commented on a change in pull request #135: Goraexplorer needed changes

2019-03-18 Thread GitBox
djkevincr commented on a change in pull request #135: Goraexplorer needed 
changes
URL: https://github.com/apache/gora/pull/135#discussion_r266567600
 
 

 ##
 File path: 
gora-pig/src/test/java/org/apache/gora/pig/GoraStorageTest.java-disabled
 ##
 @@ -0,0 +1,352 @@
+package org.apache.gora.pig;
 
 Review comment:
   Were you able to try the same with HBase 2 upgrade?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [gora] djkevincr commented on issue #135: Goraexplorer needed changes

2019-03-18 Thread GitBox
djkevincr commented on issue #135: Goraexplorer needed changes
URL: https://github.com/apache/gora/pull/135#issuecomment-474029378
 
 
   Locally tested the  PR, build passes without any test failures.
   
   [INFO] 

   [INFO] Reactor Summary:
   [INFO] 
   [INFO] Apache Gora  SUCCESS [  2.381 
s]
   [INFO] Apache Gora :: Compiler  SUCCESS [  3.077 
s]
   [INFO] Apache Gora :: Compiler-CLI  SUCCESS [  1.385 
s]
   [INFO] Apache Gora :: Core  SUCCESS [01:56 
min]
   [INFO] Apache Gora :: Pig . SUCCESS [  3.235 
s]
   [INFO] Apache Gora :: Accumulo  SUCCESS [08:07 
min]
   [INFO] Apache Gora :: HBase ... SUCCESS [03:24 
min]
   [INFO] Apache Gora :: Cassandra - CQL . SUCCESS [01:53 
min]
   [INFO] Apache Gora :: GoraCI .. SUCCESS [  3.998 
s]
   [INFO] Apache Gora :: Infinispan .. SUCCESS [01:22 
min]
   [INFO] Apache Gora :: JCache .. SUCCESS [01:21 
min]
   [INFO] Apache Gora :: OrientDB  SUCCESS [01:48 
min]
   [INFO] Apache Gora :: Dynamodb  SUCCESS [  4.441 
s]
   [INFO] Apache Gora :: CouchDB . SUCCESS [  4.872 
s]
   [INFO] Apache Gora :: Maven Plugin  SUCCESS [  3.021 
s]
   [INFO] Apache Gora :: MongoDB . SUCCESS [02:04 
min]
   [INFO] Apache Gora :: Solr  SUCCESS [02:59 
min]
   [INFO] Apache Gora :: Aerospike ... SUCCESS [  2.849 
s]
   [INFO] Apache Gora :: Ignite .. SUCCESS [02:56 
min]
   [INFO] Apache Gora :: Tutorial  SUCCESS [  6.486 
s]
   [INFO] Apache Gora :: Sources-Dist  SUCCESS [  0.364 
s]
   [INFO] 

   [INFO] BUILD SUCCESS
   [INFO] 

   [INFO] Total time: 28:32 min
   [INFO] Finished at: 2019-03-18T23:14:41+05:30
   [INFO] Final Memory: 101M/1679M
   [INFO] 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GORA-555) Improve Lucene query implementation with NumericRangeQuery

2019-03-18 Thread Kevin Ratnasekera (JIRA)
Kevin Ratnasekera created GORA-555:
--

 Summary: Improve Lucene query implementation with 
NumericRangeQuery 
 Key: GORA-555
 URL: https://issues.apache.org/jira/browse/GORA-555
 Project: Apache Gora
  Issue Type: Improvement
Reporter: Kevin Ratnasekera


There performance benefits around NumericRangeQuery. Please notice comment on 
LuceneQuery implementation.
```
 //TODO: Change this to a NumericRangeQuery when necessary (it's faster)
  String lower = null;
  String upper = null;
  if (getStartKey() != null) {
//Do we need to escape the term?
lower = getStartKey().toString();
  }
  if (getEndKey() != null) {
upper = getEndKey().toString();
  }
  if (upper == null && lower == null) {
q = new MatchAllDocsQuery();
  } else {
q = TermRangeQuery.newStringRange(pk, lower, upper, true, true);
  }
```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GORA-555) Improve Lucene query implementation with NumericRangeQuery

2019-03-18 Thread Kevin Ratnasekera (JIRA)


 [ 
https://issues.apache.org/jira/browse/GORA-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Ratnasekera updated GORA-555:
---
Description: 
There are performance benefits around NumericRangeQuery. Please notice comment 
on LuceneQuery implementation.
{code}
 //TODO: Change this to a NumericRangeQuery when necessary (it's faster)
  String lower = null;
  String upper = null;
  if (getStartKey() != null) {
//Do we need to escape the term?
lower = getStartKey().toString();
  }
  if (getEndKey() != null) {
upper = getEndKey().toString();
  }
  if (upper == null && lower == null) {
q = new MatchAllDocsQuery();
  } else {
q = TermRangeQuery.newStringRange(pk, lower, upper, true, true);
  }
{code}

  was:
There performance benefits around NumericRangeQuery. Please notice comment on 
LuceneQuery implementation.
{code}
 //TODO: Change this to a NumericRangeQuery when necessary (it's faster)
  String lower = null;
  String upper = null;
  if (getStartKey() != null) {
//Do we need to escape the term?
lower = getStartKey().toString();
  }
  if (getEndKey() != null) {
upper = getEndKey().toString();
  }
  if (upper == null && lower == null) {
q = new MatchAllDocsQuery();
  } else {
q = TermRangeQuery.newStringRange(pk, lower, upper, true, true);
  }
{code}


> Improve Lucene query implementation with NumericRangeQuery 
> ---
>
> Key: GORA-555
> URL: https://issues.apache.org/jira/browse/GORA-555
> Project: Apache Gora
>  Issue Type: Improvement
>Reporter: Kevin Ratnasekera
>Priority: Major
>
> There are performance benefits around NumericRangeQuery. Please notice 
> comment on LuceneQuery implementation.
> {code}
>  //TODO: Change this to a NumericRangeQuery when necessary (it's faster)
>   String lower = null;
>   String upper = null;
>   if (getStartKey() != null) {
> //Do we need to escape the term?
> lower = getStartKey().toString();
>   }
>   if (getEndKey() != null) {
> upper = getEndKey().toString();
>   }
>   if (upper == null && lower == null) {
> q = new MatchAllDocsQuery();
>   } else {
> q = TermRangeQuery.newStringRange(pk, lower, upper, true, true);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [gora] djkevincr edited a comment on issue #152: GORA-266 Lucene datastore for Gora - lewismc

2019-03-18 Thread GitBox
djkevincr edited a comment on issue #152: GORA-266 Lucene datastore for Gora - 
lewismc
URL: https://github.com/apache/gora/pull/152#issuecomment-473925982
 
 
   @lewismc I have created issue [1] on NumericRangeQuery. Let s address this 
separately. I think this PR is in good shape to be merged.
   All major test cases passes without any issue. 
   [1] https://issues.apache.org/jira/browse/GORA-555


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [gora] djkevincr commented on issue #152: GORA-266 Lucene datastore for Gora - lewismc

2019-03-18 Thread GitBox
djkevincr commented on issue #152: GORA-266 Lucene datastore for Gora - lewismc
URL: https://github.com/apache/gora/pull/152#issuecomment-473925982
 
 
   @lewismc I have created issue [1] on NumericRangeQuery. I think this PR is 
in good shape to be merged.
   All major test cases passes without any issue. 
   [1] https://issues.apache.org/jira/browse/GORA-555


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (GORA-555) Improve Lucene query implementation with NumericRangeQuery

2019-03-18 Thread Kevin Ratnasekera (JIRA)


 [ 
https://issues.apache.org/jira/browse/GORA-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Ratnasekera updated GORA-555:
---
Description: 
There performance benefits around NumericRangeQuery. Please notice comment on 
LuceneQuery implementation.
{code}
 //TODO: Change this to a NumericRangeQuery when necessary (it's faster)
  String lower = null;
  String upper = null;
  if (getStartKey() != null) {
//Do we need to escape the term?
lower = getStartKey().toString();
  }
  if (getEndKey() != null) {
upper = getEndKey().toString();
  }
  if (upper == null && lower == null) {
q = new MatchAllDocsQuery();
  } else {
q = TermRangeQuery.newStringRange(pk, lower, upper, true, true);
  }
{code}

  was:
There performance benefits around NumericRangeQuery. Please notice comment on 
LuceneQuery implementation.
```
 //TODO: Change this to a NumericRangeQuery when necessary (it's faster)
  String lower = null;
  String upper = null;
  if (getStartKey() != null) {
//Do we need to escape the term?
lower = getStartKey().toString();
  }
  if (getEndKey() != null) {
upper = getEndKey().toString();
  }
  if (upper == null && lower == null) {
q = new MatchAllDocsQuery();
  } else {
q = TermRangeQuery.newStringRange(pk, lower, upper, true, true);
  }
```


> Improve Lucene query implementation with NumericRangeQuery 
> ---
>
> Key: GORA-555
> URL: https://issues.apache.org/jira/browse/GORA-555
> Project: Apache Gora
>  Issue Type: Improvement
>Reporter: Kevin Ratnasekera
>Priority: Major
>
> There performance benefits around NumericRangeQuery. Please notice comment on 
> LuceneQuery implementation.
> {code}
>  //TODO: Change this to a NumericRangeQuery when necessary (it's faster)
>   String lower = null;
>   String upper = null;
>   if (getStartKey() != null) {
> //Do we need to escape the term?
> lower = getStartKey().toString();
>   }
>   if (getEndKey() != null) {
> upper = getEndKey().toString();
>   }
>   if (upper == null && lower == null) {
> q = new MatchAllDocsQuery();
>   } else {
> q = TermRangeQuery.newStringRange(pk, lower, upper, true, true);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GORA-554) Upgrade Solr dependency to latest

2019-03-18 Thread Kevin Ratnasekera (JIRA)
Kevin Ratnasekera created GORA-554:
--

 Summary: Upgrade Solr dependency to latest
 Key: GORA-554
 URL: https://issues.apache.org/jira/browse/GORA-554
 Project: Apache Gora
  Issue Type: Improvement
Reporter: Kevin Ratnasekera


Current stable version is @ 8.0.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [gora] djkevincr closed pull request #131: GORA-266 Lucene datastore for Gora

2019-03-18 Thread GitBox
djkevincr closed pull request #131: GORA-266 Lucene datastore for Gora
URL: https://github.com/apache/gora/pull/131
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [gora] djkevincr commented on issue #131: GORA-266 Lucene datastore for Gora

2019-03-18 Thread GitBox
djkevincr commented on issue #131: GORA-266 Lucene datastore for Gora
URL: https://github.com/apache/gora/pull/131#issuecomment-473918146
 
 
   Closing this PR after submitting more update version of the same work in 
https://github.com/apache/gora/pull/152


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services